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ABSTRACT 


The  ANAJYS-2  provides  the  Navy  with  a  state  of  the  art  Digital  Signal  Processor. 
The  ANAJYS-2  is  programmed  utilizing  the  Processing  Graph  Methodology  (PGM), 
which  represents  specific  tasks  as  nodes  in  a  graph.  It  utilizes  a  simple  First-Come-First- 
Served  (FCFS)  run-dme  resource  allocation  mechanism  that  supports  large-grain  data  flow 
processing.  While  the  mechanism  is  robust,  easy  to  implement,  and  results  in  low  run¬ 
time  overtiead,  it  is  difficult  to  predict  if  a  given  PGM  will  meet  the  application 
requirements.  Therefore,  an  approach  that  uses  compUe-time  analysis  to  exploit  the 
periodic  arrival  of  data  and  a  priori  knowledge  of  the  amount  of  computation  and 
communication  overhead  is  investigated.  Improvement  in  performance  of  the  machine 
when  the  PGM  graphs  are  restructured  using  this  approach,  called  Revolving  Cylinder 
scheduling,  is  observed;  and  it  is  found  to  be  effective  when  there  is  a  high 
communication  overtiead  or  when  the  PGM  nodes  are  of  uniform  size. 
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1.  INTRODUCTION 


Since  the  advent  of  radar  before  World  War  H,  the  success  of  modem  warfare  has 
depended  on  the  ability  of  a  system  to  process  ”...  electronic  signals  to  detect,  localize, 
attack,  and  counter  increasingly  sophisticated  threats.”  [RICE  90,  pp.  1-2]  The  more 
sophisticated  the  system,  the  more  complex  the  signal  processing  requirements.  Not  only 
does  the  complexity  depend  on  the  number  of  operations  to  be  performed,  but  also  oh  the 
time  interval  available  for  completion  [BELLANGER  84,  pp.  3].  Within  the  realm  of 
modem  warfare,  milliseconds  can  be  the  difference  between  survival  or  death.  The 
ability  to  quickly  interpret  and  disseminate  incoming  target  data  provides  the  user  a 
discernible  edge  during  today's  modem  warl^. 

No  longer  does  the  Navy  need  to  rely  upon  electro-mechanical  processors.  The 
development  of  data  transducers  and  the  advent  of  multi-processors  allow  electrical  signals 
to  be  processed  in  real  time  using  digital  methods  [BELLANGER  84,  pp.  4-5].  But,  if 
digital  signal  processing  machines  are  to  operate  in  real  time,  they  must  operate  at  a  rate 
which  is  closely  related  to  the  sampling  frequency  of  the  signals. 

The  ANAJYS-2  Enhanced  Modular  Signal  Processor  is  utilized  in  many  different 
signal  processing  applications,  from  the  acoustic  system  on  a  P3-C  "Orion”  aircraft  to  the 
BSY-2  Sonar  Suite  on  the  SSN-21  "Seawoir  class  submarine.  Although  different 
configurations  are  possible,  the  search  mission  drives  the  envelope  of  the  processing 
system.  The  United  States'  Navy's  signal  processing  requirements  have  been  increasing 
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since  the  advent  of  electronics  conception  in  the  early  part  of  the  twentieth  century  and 
"...are  expected  to  increase  tenfold  within  the  next  ten  years."  [RICE  90,  pp.  2] 

The  ANAJYS-2  is  meant  to  provide  the  United  States’  Navy  with  a  standard, 
programmable,  modular,  multi-processor  capable  of  meeting  the  digital  signal  processing 
requirements  into  the  twenty-first  century.  Yet,  modifications  will  be  required  if  it  is  to 
maintain  its  goal. 

A.  BACKGROUND 

The  iimovation  of  new  weapon  platforms,  led  by  Autonomous  Underwater  Vehicles 
and  Remotely  Piloted  Vehicles,  and  the  advancement  of  weapon  technology  requires  an 
intelligent  stand-alone  programmable  multi-processor. 

"To  achieve  high  performance  in  a  processor  specialized  for  signal  processing,  the 
need  to  depart  from  the  simplicity  of  von  Neumann  computer  architectures  is  axiomatic.  ” 
[LEE  87,  pp.  24] 

1.  AN/UYS>2  Design  Theory 

Parallel  computations  that  exist  in  signal  processing  can  be  naturally 
represented  as  data-flow  graphs.  The  data-flow  approach  was  first  presented  by  Karp, 
and  it  has  since  been  expanded  by  many  including  Dennis  and  Watson  [KARP  66,  pp. 
1390-1411,  DENNIS  80,  pp.  48-56,  WATSON  82,  pp.  51-57].  These  graphs  not  only 
describe  the  dependencies  between  different  parts  of  the  computation  required  in  an 
application,  but  also  provide  built-in  scheduling  and  synchronization. 
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While  data-Aow  techniques  have  been  applied  to  digital  signal  processing  since 
its  earliest  days.  Navy  sensor  systems  have  continued  to  employ  a  control-Aow  method 
until  the  ANAJYS  series  development  during  the  1980's  [LEE  90,  pp.  333,  RICE  90,  pp. 
2].  Time-line  control,  in  which  a  single  control  signal  generates  program  execution  and 
provides  the  output,  characterizes  a  control-Aow  architecture.  Multi-thread  control  Aow 
architectures  are  achievable  by  use  of  more  than  one  control  signal,  though  it  is  difAcult 
to  develop  programs  that  mold  themselves  to  this  simpliAed  structure. 

Data-Aow  representation  of  digital  signal  processing  algorithms  provides  a 
natural  exploitation  of  concurrence  [LEE  90,  pp.  333].  Typical  data-Aow  algorithms 
execute  a  task  based  upon  the  availability  of  input  data  and  machine  resources,  thereby 
enabling  the  data  to  exist  only  between  its  production  and  consumption  and  eliminating 
the  need  for  a  distinct  program  counter  [RICE  90,  pp.  2]. 

A  distinction  must  be  made  between  large-grain  and  fine-grain  data-Aow 
architectures.  Fine-grain  architectures  have  their  uses  in  Very  Large  Scale  Integration 
as  documented  by  Koren,  Silberman,  and  Dennis  [DENNIS  80,  pp.  48-56,  KOREN  83, 
pp.  335-337].  But,  the  use  of  fine-grain  data-Aow  within  the  modular  design  of  the 
ANAJYS-2  would  produce  unreasonable  communication  overheads.  Therefore,  the 
AN/UYS-2  is  b'lilt  around  the  large-grain  data-Aow  architecture  approach.  Due  to  the 
generality  of  the  data-Aow  paradigm,  it  can  be  used  to  specify  and  exploit  the  parallelism 
at  the  instruction  level  as  well  as  at  the  task  level  [BROBST  87,  pp.  40-45,  SAWKAR 
83,  pp.  344],  The  theoretical  foundation  for  consistency  of  such  representations  has  been 
well  studied  by  Lee,  Karp,  and  Miller  [LEE  87,  pp.  24-35,  KARP  66,  pp.  1390-141 1]. 
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The  focus  of  this  work  is  on  task-level  parallelism  in  such  applications  expressed  using 
data-flow  graphs.  Such  computations  are  also  classifled  as  pipelined  Junction-parallel 
confutations  and  synchronous  data-flow  confutations  [LEE  87,  pp.  24-35].  However, 
the  machine  must  provide  mechanisms  to  manage  the  data  that  flows  through  the  graph 
and  to  capture  the  intrinsic  scheduling  and  synchronization. 

The  ANAJYS-2  utilizes  a  distributed  run-time  operating  system  which 
implements  a  hybrid  control-flow/data-flow  architecture  by  utilizing  the  data-flow 
technique  at  the  task  level  and  the  control-flow  ai^roach  at  the  elementary  processing 
level  [POPS  89,  pp.  2-9,  RICE  90,  pp.  2].  This  helps  to  minimize  the  communication 
costs  involved  while  providing  efficient  elementary  level  execution. 

2.  AN/UYS-2  Design  Problems 

The  mechanisms  involved  in  managing  graph  data  flow,  intrinsic  scheduling, 
and  synchronization,  typically  operating  at  run-time,  result  in  overheads  that  lead  to 
suboptimal  performance. 

The  non-deterministic  (first-come-first-served)  scheduling  strategy  used  by  the 
AN/UYS-2  will  not  be  able  to  maintain  throughput  with  the  rapidly  increasing  data-flow 
graph  bandwidth. 

Increasing  sensor  system  complexity  compounded  by  decreasing  allowable 
reaction  time  is  likely  to  degrade  the  ANAJYS-2  capabilities  considerably. 
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B.  OBJECTIVES 

Since  the  AN/UYS-2  utilizes  a  non-real-dme  strategy  to  schedule  the  application's 
nodes  to  dree  processors,  this  thesis  investigates  a  robust,  compile-time  technique  that 
supports  a  simple  run-time  mechanism  to  improve  throughput  and  predictability  in  the 
AN/UYS-2  architecture  including: 

•  Investigation  of  Digital  Signal  Processing  (DSP)  requirements 

•  Development  of  a  compile-time  algorithm  for  graph  restructuring 

•  Design  and  implementation  of  an  AN/UYS-2  software  simulator 

•  Performance  evaluation  and  comparison  of  the  existing  and  proposed  scheduling 

algorithms  [POPS  90,  pp.  6*3]. 

A  compile-time  approach  is  possible  in  DSP  due  to  the  tremendous  amount  of 
information  that  is  known  about  each  task.  It  simplifies  the  application  developer's  task 
and  results  in  no  change  to  the  run-time  mechanism. 

C.  THESIS  ORGANIZATION 

Chapter  n  describes  the  architecture  of  the  AN/UYS-2  in  detail  and  discusses  the 
program  interface.  The  modular  design  and  actual  components  are  elaborated.  Chapter 
m  studies  the  requirements  of  signal  processing  applications,  examines  the  current 
AN/UYS-2  scheduling  strategy,  and  proposes  a  deterministic  scheduling  approach.  C+  + 
code  for  the  deterministic  algorithm  is  provided.  A  simple  graph  is  scheduled  as  an 
example.  (Chapter  IV  describes  the  simul^or  constructed  for  this  thesis  and  its 
limitations.  Background  information  and  simulator  specifics,  including  coding  techniques, 
are  also  provided.  Chapter  V  analyzes  the  performance  of  the  non-deterministic  and 
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deterministic  algorithms  by  utilizing  two  key  signal  processing  examples.  Chapter  VI 
provides  a  summary  of  the  overall  work,  and  gives  recommendations  for  future  expansion 
and  improvement. 


n.  ARCHITECTURE  AND  PROGRAMMING  OF  THE  AN/UYS-2 

This  chapter  describes  the  architecture  of  the  AN/UYS-2  in  detail  and  discusses  how 
the  AN/UYS-2  is  programmed  to  achieve  its  desired  results.  The  data-flow  principle  is 
examined  from  the  view  point  of  the  modular  components. 

A.  ARCHTTECTURE 

Different  DSP  applications  have  specific  processing  requirements.  The  Navy's 
diverse  operating  environment,  stringent  operational  tempo,  and  sundry  weapon  platforms 
contribute  heavily  towards  the  requirement  for  a  modular  based  design.  In  addition, 
specific  reliability  criteria  required  in  today's  weapon  systems  are  easier  to  implement  and 
maintain  in  a  modular  system.  Within  the  last  twenty  years,  ”...  substantial  reductions 
in  the  cost  of  digital  computation  have  occurred  accompanied  by  an  improvement  in 
performance,  speed,  memory  capacity,  and  ease  of  programming  ...”  making  modular 
design  even  more  attractive  [BEAUCHAMP  79,  pp.  iv].  Rapidly  changing  hardware 
technology  and  DSP.  requirements  are  easier  to  implement  in  a  modular  design.  The 
AN/UYS-2  architecture  provides  for  many  modular  machine  configurations,  tailoring 
itself  to  a  set  application,  while  maintaining  a  cohesiveness  and  implementation  ease  as 
an  overall  system.  [POPS  90,  pp.  2-6]. 

The  AN/UYS-2  comes  packaged  in  two  Standard  Electronic  Module  (SEM)  type 
implementations  which  enable  the  unit  to  be  configured  for  either  ship  based  or  aircraft 
mounting.  The  standard  SEM  "B"  cabinet  format  and  the  smaller  SEM  "E"  format  which 
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optimizes  the  unit  with  regards  to  size  and  weight,  making  it  ideal  for  aircraft 
dissemination.  Within  each  SEM  class,  the  AN/UYS-2's  components  are  constructed  on 
very  high  speed  int^rated  circuit  cards.  Specific  hardware  implementation  is  documented 
by  Rice.  [RICE  90,  pp.  4  -7] 

1.  Modularity 

The  AN/UYS-2's  modular  design  is  based  on  six  functional  element  (FE) 
types:  the  Scheduler  (SCH),  the  Arithmetic  Processors  (AP's),  the  Global  Memories 
(GM’s),  the  Input/Output  Processors  (IOP*s),  the  Command  Program  Processor  (CPP), 
and  the  Input  Signal  Conditioner's  (ISC's);  and  two  data  paths:  the  Control  Bus  (CBUS) 
and  the  Data  Transfer  Network  (DTN)  as  shown  in  Figure  1 .  These  six  FE  types  can  be 
fused  into  any  combination  as  long  as  the  requirements  specified  in  Table  I  are 
maii..ained,  thereby  enabling  the  AN/UYS-2  to  be  tailored  to  a  specific  application 
(APPLIC  90,  pp.  2-3].  Each  FE  performs  certain  system  tasks.  The  AN/UYS-2  layout 
enables  the  incorporation  of  new  FE's  into  the  modular  structure  as  long  as  the 
communication  interface  to  the  DTN  and  CBUS  remains  constant. 

These  modular  FE' s  perform  parallel  computation  by  associating  "...  each  step 
of  the  algorithm  with  a  node  of  a  directed  graph."  [KARP  66,  pp.  1390]  As  data  for 
these  nodes  becomes  available,  each  node  must  be  scheduled  to  execute  on  an  AP.  This 
is  accomplished  by  the  scheduler.  The  actual  programming  approach  is  discussed  later 
in  this  chapter. 


1.  Data  to 
and  from 
External 
Worid 


2.  User 
Interfj 


GM1  GM2 


CPP  AP1 


SCk-i  lOPk 


rigore  1:  The  AN/UYS-2  Architecttire 

Table  I:  FUNCTIONAL  ELEMENT  REQUIREMENTS 


Minimum 

Maximum 

Number  of  AP's 

1 

59  -  lOP's  -  ISC's 

Number  of  GM's 

1 

30 

Number  of  SCH's 

1 

1 

Number  of  lOP's 

1  OR 

59  -  AP's  -  ISC's 

Number  of  ISC's 

1 

59  -  AP's  -  ISC's 

Number  of  CPP's 

1 

1 

Total  Number 

5 

63 
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a.  The  Scheduler 


The  SCH  perfonns  the  node  scheduling  operation  of  matching  a  ready 
node  to  a  free  AP  by  maintaining  four  tables:*  the  ready-node  list,  the  free  processing 
element  (PE)  list,  the  node  status  table,  and  the  queue-to-node  table  [RICE  90,  pp.  4, 
POPS  90,  pp.  6-3].  The  SCH  tea-*  »res  queue  information  fiom  the  GM's.  As  queues 
exceed  threshold  levels,  the  GM's  send  queue  over  threshold  messages  to  the  SCH  via 
the  CBUS.  Since  the  SCH  also  receives  AP  availability  information  via  the  CBUS,  when 
all  of  a  graph  node's  queu'^  are  over  threshold,  it  then  attempts  to  match  free  AP's  to 
ready  graph  nodes.  If  a  match  is  successful,  and  that  graph  node  is  not  currently 
executing,  scheduling  data  is  sent  to  the  GM's  in  the  form  of  a  message  via  the  Control 
Bus  (CBUS);  and  the  database  tables  are  updated  to  reflea  the  match.  If  a  match  is 
unsuccessful,  the  node  status  table  and  ready  node  list  are  updated  to  indicate  the  new 
ready  node.  When  an  AP  indicates  to  the  SCH  that  it  is  Ready  For  Instruction  Stream 
(RFIS),  the  SCH  checks  the  ready  node  list  for  nodes  ready  to  execute  and  assigns  one 
to  the  now  available  AP.  [POPS  90,  pp.  6-3] 

b.  The  Global  Memories 

The  GM  provides  the  doa  storage  for  the  AN/UYb-2  and  executes 
memory  management  primitive  functions.  Each’ Processing  Graph  Methcxiology  (PGM) 
queue,  discussed  later  in  this  Chapter,  is  allocated  to  a  GM  for  storage,  and  that  GM 
maintains  that  queue's  state  information.  Queue  state  information  consists  of  read, 
produce,  consume,  and  threshold  quantities,  queue  sLe,  and  a  queue  identification 
number.  When  a  queue  exceeds  threshold  or  capacity  values,  or  returns  under  threshold. 


the  GM's  notify  the  SCH  via  the  CBUS.  Upon  receiving  instructions  from  the  SCH,  the 
GM's  send  the  ^)propriate  execute  messages  to  the  AP's  and  lOP's  via  the  CBUS  and 
DTN.  The  GM's  maintain  the  control  variables  and  node  information  necessary  for  the 
AP's  to  execute  the  node  successfully.  After  the  AP  has  completed  node  breakdown  and 
informed  the  GM,  the  GM  consumes  the  input  data  involved  by  freeing  the  storage 
previously  used  by  those  queues  and  notifies  the  SCH  of  the  current  queue  level.  [POPS 
90.  pp.  7-9] 

c.  Ihe  Arithmetic  Processors 

After  the  SCH  sends  the  node  execute  instruction  to  the  GM’s,  the  GM’s 
relay  the  information,  by  another  message  via  the  CBUS,  to  an  AP.  The  AP  then 
executes  the  actual  signal  processing  primitives.  The  message  from  the  GM's  contains 
all  of  the  required  information  for  the  AP  to  read  the  necessary  queue  data  from  the 
GM’s  (set-up),  execute  the  designated  primitives,  and  write  the  output  queue  data  back 
to  the  GM's  (breakdown).  The  AP's  consist  of  an  Arithmetic  Unit  (AU)  and  a  Control 
Unit  (CU);  consequently,  three  nodes  can  be  associated  with  an  AP  at  any  one  time: 

•  The  tint  node  being  setup  within  the  CU 

•  The  next  node  executing  within  the  AU 

•  The  third  node  performing  breakdown  within  the  CU. 

When  the  AP  has  completed  execution  of  its  current  node  and  setup  of  its  next  node,  it 
notifies  the  SCH  via  the  CBUS  that  it  is  ready  to  process  the  following  node.  The  ability 
of  the  CU  to  concurrently  perform  setup  and  breakdown  on  distinct  nodes  and  to  notify 
the  SCH  without  the  AU's  knowledge  helps  to  increase  the  concurrence  involved  with 
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minimum  loss  of  through-put  due  to  the  non-availability  of  resources.  [RICE  90,  pp.  5, 
POPS  90,  pp.  9-8] 


d.  The  Input/Output  Pmcessors 

The  lOP's  provide  for  raw  digitized  sensor  data  input  and  processed  data 
ouQnit  from  the  AN/UYS-2  by  buffering  and  formatting  the  input  and  ouQ)ut  data  to 
synchronize  the  different  external  device  data  rates  with  the  internal  network.  Upon 
arrival  of  sufficient  external  data  as  determined  by  input  buffer  size,  the  lOP's  dispatch 
the  input  data  to  the  GM  queues  after  an  amount  specified  by  the  application  programmer 
is  received.  The  lOP's  receive  the  output  data  from  the  GM's  and  redirect  it  to  the 
external  world.  [POPS  90,  pp,  8-1] 

e.  The  Input  Signed  Conditioner 

In  addition  to  performing  the  same  functions  as  the  lOP's,  the  ISC's 
perform  signal  conditioning  operations  to  reduce  input  data  bandwidth  and  generates 
output  for  sensor  control.  The  ISC's  are  capable  of  performing  analog-to-digital 
conversions  during  the  input/output  process.  [POPS  90,  pp.  10-1] 

2.  Cohesiveness 

Despite  the  modularity  of  the  AN/UYS-2,  key  elements  of  its  structure  mold 
it  into  a  cohesive  unit.  The  cohesiveness  is  built  around  the  ability  of  the  modules  to 
communicate  and  interact  with  each  other.  The  communication  among  FE's  occurs  over 
the  Data  Transfer  Network  and  Control  Buses.  All  interaction  between  the  user  and  the 
AN/UYS-2  occurs  in  the  Command  Program  Processor. 
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a.  Control  Buses 


The  FE  built-in-test  control  bus  (BITCBUS)  provides  a  means  with  which 
the  Command  Program  Processor  (CPP)  can  test  the  AN/UYS-2  system,  and  the  CBUS 
provides  the  main  means  by  which  the  modular  FE's  communicate  short  messages  of 
data-flow  control  information  [RICE  90,  pp.  5]. 

b.  Data  Tranter  Network 

The  DTN  provides  for  up  to  16  simultaneous,  asynchronous, 
unidirectional  32-bit  data  transfers  among  FE's  by  continuously  polling  the  data  sources. 
When  a  source  requests  a  transfer  to  a  destination  that  is  not  already  recdving  data,  the 
DTN  path  is  established  and  the  transfer  conducted.  Possible  DTN  configurations  include 
2,  4,  8,  or  16  input  and  output  ports;  each  port  can  have  up  to  four  further 
disseminations.  [POPS  90,  pp.  4-1] 

c.  Command  Program  Processor 

The  CPP  provides  the  glue  that  holds  the  entire  AN/UYS-2  architecture 
together.  The  CPP  acts  as  the  overall  control  unit  for  the  AN/UYS-2  by  performing  the 
following  functions: 

•  Data-flow  graph  management 

•  Tactical  interface 

•  Input/output  configuration  control 

•  System  performance  monitoring  [POPS  90,  pp.  5-1]. 
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B.  Programming 

New  programming  methodologies  are  required  for  data-flow  architectures  because 
data-flow  computers  depart  from  conventional  architectures  [DENNIS  83,  pp.  331].  The 
AN/UYS-2  is  programmed  using  the  Enhanced  Modular  Signal  Processor  Common 
Operational  Support  Software,  which  refers  to  the  implementation  of  the  PGM  on  the 
ANAJYS-2  [POPS  90,  pp.  2-8,  ECOS  89,  pp.  1-25]. 

1.  Graphical  Interface 

The  PGM  yields  a  convenient  means  of  writing  application  software  without 
concern  for  the  specific  architecture  of  the  machine  on  which  it  would  run  [PGMTUT  90, 
pp.  1-1].  The  PGM  provides  the  application  programmer  with  a  high  level  graph 
oriented  language,  provides  a  method  of  translating  these  graphs  inh>  load  modules  that 
the  ANAJYS-2  can  recognize,  and  furnishes  a  nm-time  support  environment  which 
expands  graph  realizations  and  manages  execution  of  graph  instances  [APPLIC  90, 
pp.  2-5]. 

A  PGM  application  consists  of  a  direaed,  acyclic  graph  with  nodes 
representing  large  grain  computations  called  primitives,  which  are  chosen  from  a  self- 
contained  library  of  signal  processing  functions.  A  simple  example  PGM  appears  in 
Figxire  2.  The  edges  of  the  graph  represent  queues  which  receive  data  from  the  source 
primitive  and  supply  data  to  the  destination  primitive.  Conversion  of  the  PGM 
description  into  a  executable  data-flow  graph  entails  only  specifying  the  read,  produce, 
consume,  and  threshold  values.  The  AN/UYS-2  is  designed  to  run  several  signal 
processing  applications  simultaneously.  Therefore,  several  instances  of  multiple  PGM's 
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may  be  running  at  the  same  time.  The  graph  is  loaded  into  the  AN/UYS-2  by  the  CPP 
prior  to  graph  initialization. 

2.  Graph  to  Program  Conversion 

Since  the  AN/UYS-2  is  unable  to  recognize  the  PGM  graph  directly  as  shown 
previously  in  Figure  2,  the  graph  must  be  converted  to  a  key-word  program.  This 
conversion  is  typically  performed  by  machine  utilizing  the  guidelines  supplied  in  the 
Application  Programmer  User  Manual  [APPLIC  90,  pp.  all].  A  possible  program  for  the 
graph  of  Figure  2  appears  in  Figure  3.  The  node  primitives  that  appear  within  the  nodes 
are  documented  in  the  primitive  library  [PRIMLIB  90,  pp.  all].  The  name  of  the  graph 
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is  specified  along  with  its  external  input  and  output  queues.  Constants  are  then  initialized 
followed  by  a  description  of  all  queues  and  nodes.  A  queue  is  described  by  a  name 
followed  by  a  and  a  type  name.  A  node  is  completely  described  by  a  name, 
primitive,  input  queues  ”ptim_in",  and  output  queues  "prim^out.”  The  constructed 
pr(^;iam  is  then  loaded  into  the  ANAJYS-2  and  executed  by  the  CPP  following  the  data¬ 
flow  paradigm. 


graph  (SeunplePGMGraph 

INPUTQ-Hydrophone  :  fixed (1) 
OtrrPUTQ«BeamOutl ,  BeamOut2  :  int) 

GIF  scan  :  int  Initialize  to  4096 
%Queue  (FHyd  :  dfloat) 

%<2ueue  ( [  1 . .  2  ]  Rep  :  dfloat ) 

Node  (BeamForm 

priniitive=BFR__FREQ 

pr iiB_inascan ,  Hydrophone  thresho  ld»scan 
prim” out»FHyd) 

%Node  (Replicate 

primitive»DFC_REP 

pr im_in»l , 2 , FHyd  threshold*! 

prim” out* ( 1 . . 2 ] Rep) 

%Node  (FFTl” 

primitive*FFT_CR 

prim_in*scan, [l]Rep  thresho ld*scan 
pr  im_out*Be2uiiOutl ) 

%Node  (FFT2 

primitive*FFT_CR 

pr im_in*scan , [ 2 ] Rep  threshold*scan 
pr im_out*BeamOut2 ) 
endgraph 


Figure  3:  Converted  PGM  Program 


ffl.  SCHEDULING  OF  PGM  ON  THE  ANAJYS-2 
This  chapter  diagrams  the  specific  scheduling  of  PGM  on  the  AN/UYS-2  by 
examining  the  signal  processing  requirements  and  the  methods  of  resource  allocation. 
The  current  First-Come-First-Served  (FCFS)  implementation  and  the  proposed  real-time 
scheduling  algorithm  are  examined  in  detail. 

A.  SIGNAL  PROCESSING  REQUIREMENTS 

Due  to  DSP's  concurrency  and  high  throughput  requiiemoits,  large-grain  data-flow 
programming  models  can  be  used  effectively  to  exploit  the  intrinsic  parallelism  P’ARHI 
88,  pp.  178].  General  data-flow  processing  requires  a  direct  hardware  implementation 
of  the  data-flow  paradigm  [GURD  85,  pp.  34-52,  BROBST  87,  pp.  40-45].  This  results 
in  unmanageable  overheads.  However,  for  specific  classes  of  applications,  such  as  signal 
processing,  data-flow  can  be  managed  very  efficiently  at  the  macro  level  to  obtain 
significant  performance  improvement.  This  is  due  to  the  ability  of  representing  digital 
signal  processing  applications  as  synchronous  data-flow  graphs  [LEE  87,  pp.  24]. 

Executing  data-flow  descriptions  of  DSP's  applications  on  parallel  processors 
requires  decisions  about  allocation,  ordering,  and  data  movement.  In  the  AN/UYS-2 
context,  allocation  refers  to  the  assignment  of  PGM  nodes  to  resources.  GM's  and  AP's 
constitute  the  majority  of  the  resources.  Ordering  relates  the  node  assignment  and 
execution  sequence  on  these  resources.  Data  movement  reflects  the  method  and  amount 
of  data  internally  transferred  in  between  executions.  Since  large-grain  data-flow 
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architectures  inherently  possess  a  high  decision  making  overhead,  the  data-flow  principle 
must  be  allowed  to  take  effect  in  order  to  capture  the  concurrency  and  minimize  the 
decision  making  overhead.  Synchronous  applications  assist  in  limiting  this  overhead. 
The  three  properties  of  these  applications  that  make  this  possible  are  availability  of  a 
priori,  knowledge  of  the  amount  of  data  produced  and  consumed,  known  function 
execution  times,  and  negligible  use  of  conditionals  and  recursion.  [LEE  87,  pp.  25-31] 

1.  Desirable  Characteristics  In  Execution 

In  real-time  signal  processing  applications,  the  principle  reqtiirements  are 
predictability  and  performance  as  measured  by  throughput.  An  additional  characteristic 
imposed  by  the  AN/UYS-2  is  on-line  reconfiguration.  Since  the  ANAJYS-2  is  a  data¬ 
flow  architecture,  this  predictability  is  critical  in  Navy  sensor  systems.  Given  the  non¬ 
determinism  of  large-grain  data-flow  model  and  a  set  number  of  available  resources  for 
computational  assignment,  how  can  deterministic,  rate-optimal  through-put  be  guaranteed? 
While  this  question  has  ^wned  complete  design  environments  like  Gabriel,  designed  by 
Lee,  the  answer  undoubtedly  lies  in  resource  allocation  [LEE  89,  pp.  333->33S].  Efficient 
resource  allocation  and  low  communications  overhead  lead  to  the  high  through-put, 
deterministic  output  required  by  DSP. 

2.  Resource  Allocatiou 

Resource  allocation  forms  the  basis  of  the  system  designer's  dilemma.  Based 
on  how  a  graph  node  and  arc  attributes  are  used  at  compile  time  and  how  much  control 
information  is  generated  to  aid  the  run-time  mechanisms,  data-flow  scheduling 
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implementations  can  be  classified  over  a  spectrum  that  ranges  homjully-dynamic  through 
Julfy-static.  Although  allocation  can  be  fidfy-dynamic,  self-timed,  static,  orjulfy-static, 
typical  Hasigns  iiHliTe  a  Combination  of  these  allocation  methods.  Fidty-dynamic 
anncatipn  performs  all  scheduling  of  nodes  at  run  time  based  on  the  readiness  of  inputs. 
In  self-timed  allocation  systems,  the  compile  determines  the  order  of  node  execution  and 
aiinratas  tesouTces,  but  the  filing  is  determined  ^  run-time  by  data  arrival.  Static  node 
allocation  involves  the  alignment  of  a  node  to  a  processor,  but  the  order  of  execution 
is  left  up  to  the  run  time  scheduler  based  on  the  node's  inputs.  In  Jully-static  allocation, 
the  compiler  determines  the  exact  firing  time,  assignment,  and  ordering  of  nodes  based 
on  that  node's  predicted  behavior.  [LEE  90,  pp.  333-334] 

Quasistatic  scheduling  methods  are  based  on  ordering  memory  accesses  by 
blending  the  static  and  self-timed  methods  into  a  hybrid  solution  that  supports  Von 
Neumann  Architectures  without  the  need  for  specifically  designed  data-flow  machines. 
Quasistatic  scheduling  maintains  the  ordering  of  nodes  on  processors  while  preserving  the 
ordering  of  accesses  to  other  shared  system  resources.  Since  quasistatic  scheduling 
incorporates  self-timed  methods,  node  execution  time  can  vary  without  affecting  the 
results.  Lee  et  al.  have  proposed  utilizing  hardware  semaphore  to  overcome  the  high 
communication  overtieads  generated  by  quasistatic  methods  [LEE  90,  pp.  334-338]. 

Many  different  processor  allocation  schemes  have  been  proposed  that  are 
inadequate  for  actual  data-flow  systems  [DAVIS  79,  pp.  1079-1086,  ARVTND  80,  pp. 
7-14,  MUNDELL  81,  pp.  156-157].  Systems  like  Gabriel,  characterized  by  Lee,  utilize 
compile  time  static  resource  assignment  and  work  well  on  sequential  data-flow  graphs. 
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whose  very  nature  enables  nodes  to  be  scheduled  at  compile  time  rather  than  run  time, 
relying  on  the  graph  structure  for  enforcement.  P»iodic  Admissible  Parallel  Schedule 
(PAPS),  also  proposed  by  Lee,  assumes  sequential  data-flow  and  establishes  precedence 
links  between  nodes  in  the  graph  and  synchronizes  these  links  at  every  graph  instance 
[LEE  87,  pp.  25-28].  Quasistatic  methods  ensure  run  time  enforcement  by  utilizing 
dynamic  control  [LEE  87,  pp.  25-28].  While  algorithms  proposed  by  Ho  and  Irani 
maintain  concurrency,  they  do  not  guaran.ee  deterministic  throughput  [HO  83]. 

As  mentioned  previously,  the  AN/UYS-2  possesses  several  sets  of  resources: 
the  AP's,  GM’s,  lOP's,  ISC's,  DTN,  and  CBUS.  lOP’s  and  ISC's  are  hardwired  into 
a  certain  configuration  for  the  external  world.  GM's  are  allocated  at  compile  time  when 

the  graph  is  initiated,  and  are  assigned  based  solely  on  the  graph  queue  structure  with  all 

» 

input  queues  associated  with  a  node  being  assigned  to  the  same  GM  if  possible.  AP's  are 
allocated  to  execute  specific  graph  nodes  by  the  scheduler  as  they  become  ready  on  a 
first-come-first-served  basis  at  run  time.  The  DTN  and  CBUS  are  configured  and 
hardwired  for  its  specific  structure  during  assembly,  but  individual  messages  are  not 
allocated  to  it  until  run  time.  [POPS  90,  pp.  3-8] 

B.  FIRST-COME-FIRST-SERVED  SCHEDULING 

The  simple  nature  of  the  SCH  is  motivated  by  the  requirements  that  it  should  not 
be  a  bottleneck,  that  it  should  maintain  a  high  and  balanced  AP  utilization,  that  it  should 
incorporate  multiple  2q)plications  simply,  and  that  it  should  behave  well  during 
reconfiguration.  The  SCH  implements  First-Come-First-Served  (FCFS)  scheduling  by 
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maintaining  a  database  consisting  of  PGM  graph  node  and  AP  information.  Available 
AP's  are  matched  with  ready  nodes  on  a  continuing  basis,  interrupted  only  by  user 
reconfiguration  of  the  system.  If  the  SCH  is  unable  to  execute  a  match,  either  due  to  no 
free  AP's  or  no  ready  nodes,  then  the  SCH  maintains  a  free  AP  list  and  a  ready  node  list. 
The  SCH  matches  the  elements  from  the  lists  together  in  the  same  FCFS  manner  as  they 
become  available. 

1.  Advantages 

The  simplicity  of  FCFS  scheduling  earns  it  the  designation  as  the  most 
attractive  scheduling  algorithm.  The  FCFS  scheduling  algorithm's  low  run-time  overhead 
costs  also  lends  itself  to  the  AN/UYS-2.  This  simple  algorithm  ensures  close  to  the 
maximum  possible  AP  utilization  since  nodes  are  sent  to  AP's  as  soon  as  they  become 
available. 

2.  Disadvantages 

The  major  disadvantage  within  the  ANAJYS-2  scheduling  arises  from 
unpredictability  in  EMSP  output  arrival.  The  dynamic  assignment  of  AP's  to  ready  nodes 
supers  from  the  following  intrinsic  disadvantage  when  data  arrives  periodically  from  the 
external  world.  The  nodes  that  depend  only  upon  the  receipt  of  external  data  get  ready 
for  execution,  and  therefore  enter  the  ready  list  at  a  rate  which  is  independent  of  the  other 
nodes.  If  the  graph  latency  is  longer  than  the  data  arrival  period,  nodes  in  the  lower 
portion  of  the  graph  get  ready  after  nodes  in  the  upper  portion.  Since  the  machine 
follows  a  FCFS  node  to  AP  assignment  strategy,  the  top  nodes  execute  at  a  higher  rate 
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than  the  nodes  lower  down  in  the  graph.  The  lower  graph  nodes'  execution  will  catch 
up  only  as  the  upper  nodes'  output  queues  exceed  capacity  and  prevent  the  top  nodes 
from  entering  the  ready  list.  As  a  result,  processed  output  data  arrives  unpredictably 
leading  to  the  possibility  of  intolerable  delays  and  insufficient  buffer  space  especially 
under  high  loads.  This  non-uniformity  in  output  arrival  arises  because  the  task-level  data¬ 
flow  mechanism  does  not  allow  any  mechanism  to  control  the  input  nodes  [SHUKLA  91 , 
pp.  222-231].  It  will  be  present  when  the  resource  allocation  at  all  stages  of  the  parallel 
machine  is  not  coupled  to  the  input  data  arrival  and  has  been  observed  due  to  FCFS 
scheduling  [SHUKLA  91,  pp.  222-231]. 

3.  A  Simple  Example 

The  PGM  graph  shown  in  Figure  4,  a  simple  graph  with  six  nodes  whose  AP 
execution  time  ate  shown  inside  the  vertices,  can  be  used  to  demonstrate  FCFS 
scheduling.  For  simplicity,  we  neglect  the  set-up  and  break-down  times  associated  with 
each  node.  Consider  that  an  AN/UYS-2  with  two  AP's,  each  of  which  has  a  processing 
speed  of  one  unit,  should  be  able  to  attain  a  maximum  data  rate  and  start  a  new  graph 
instance  every: 

NumberofAP^s  2 

Assume,  for  simplicity  of  explanation,  that  data  arrives  at  this  exact  rate.  A  possible 
resulting  order  of  execution  utilizing  FCFS  scheduling  is  documented  in  Table  II.  The 
non-uniformity  that  exists  in  the  instance  completion  times  is  inherently  obvious  in  this 
table  by  examining  the  clock  cycles  for  node  "f.”  It  should  be  noted  that  this  is  only  one 
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possible  FCFS  scheduling  order.  Different  orders  are  possible  due  to  set-up  and 
breakdown  delays  affecting  the  node  arrival  order  at  the  SCH. 

C.  REVOLVING  CYLINDER  SCHEDULING 

While  the  simple  dispatcher  works  quite  well  as  a  run-time  mechanism,  it  does  not 
yield  easily  to  compile-time  analysis.  This  technique  restructures  the  graphical  application 
description  by  performing  compile-time  analysis  of  the  application  execution  pro^e. 
Given  a  graph,  it  is  possible  to  systematically  determine  whether  it  can  be  mapped  on  a 
certain  number  of  AP's  while  satisfying  the  required  data  rates.  It  can  be  proved  that  the 
graph  can  be  scheduled  (ignoring  overheads)  such  that  the  consecutive  graph  instances  are 
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Tabl«  II;  EXEOJTION  OF  36  CYCLES  UTILIZING  FCFS  SCHEDULING 
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e2 
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C4 

35 

d6 

e6 

12 

a3 

e2 

24 

e4 

a5 

36 

d6 

e6 

separated  by  a  time  equal  to  the  total  execution  time  of  the  PGM  divided  by  the  number 
of  AP’s.  Since  the  AP's  will  be  fully  utilized,  this  time  corresponds  to  the  maximum 
throughput  rate.  The  key  idea  in  the  Revolving  Cylinder  (RC)  assignment  is  that 
inserting  delays  in  the  PGM  can  produce  a  graph  with  better  throughput.  The  idea  behind 
RC  scheduling  can  be  traced  back  to  algorithms  for  overlapping  complex  operations  on 
pipelined  processors  [RAU  82,  pp.  131-139,  SHUKLA  91,  pp.  222-231]. 
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1.  Imptementation 


RC  scheduling  recommends  when  a  graph  node  should  be  scheduled,  but 
choosing  the  AP  to  schedule  it  on  is  left  to  the  run-time  dispatcher.  This  enables  the 
actual  scheduling  to  remain  dynamic.  Assume  that  there  is  a  cylinder  whose 
circumference  is  the  sum  of  all  of  the  nodes'  execution  times  divided  by  the  number  of 
AP's  in  the  AN/UYS-2  structure.  The  idea  is  to  schedule  the  graph  such  that  it  wraps 
around  the  cylinder,  thereby  causing  its  end  to  meet  its  beginning.  The  separation  of 
b^inning  from  end  has  the  effect  of  a  divide-by-circumference  counter  every  time  the 
beginning  meets  the  end. 

Each  slot  in  the  cylinder  is  of  width  equal  to  the  smallest  size  node  in  the 

* 

graph.  For  each  node  in  the  graph,  starting  with  the  top  and  working  towards  the 
bottom,  attempt  to  schedule  the  node  at  its  earliest  start  time.  If  it  can  not  be  inserted  at 
that  time,  delay  the  start  time  by  the  width  of  a  slot  and  repeat  until  it  can  be  inserted. 
Adjust  the  earliest  start  time  of  all  descendants  of  that  node  and  repeat  the  above  sequence 
with  the  next  node  as  the  top  node  in  the  graph.  This  ensures  that  maximum  cylinder 
usage  will  result  when  the  cylinder  is  filled  by  these  algorithms  shown  in  Figure  5. 

Once  all  nodes  have  been  inserted  into  the  cylinder  and  the  cylinder  is  full, 
assign  dependencies  to  the  nodes  based  upon  their  location  in  the  cylinder.  For  each 
entry  relegated  to  an  AP  in  the  cylinder,  if  there  are  other  nodes  assigned  to  the  same  AP 
with  the  same  index  and  the  node  higher  up  in  the  cylinder  is  not  an  ancestor  of  the 
other  then  create  a  dependency  from  the  higher  node  to  the  lower.  This  algorithm  is 
shown  in  Figure  6.  Figure  7  shows  a  possible  restructuring  resulting  from  these 


25 


procedure  A3sign__RC(G,P)  ;  /*G  is  a  directed  acyclic  graph*/ 

”  /*P  is  the  number  of  AP's*/ 

q  topological  sort(G)  ;/*0(e) ,  q  is  a  queue*/ 
for  all  nodes  n^^ 

est(n£}  *-  0;  /*est  is  the  earliest  start  time*/ 

circumference  ^  0; 
for  all  nodes  n^^ 

circumference  ^  circumference  +  w(n^) ; 

/*v(nj^)  is  the  size  of  the  node  n^^*/ 
circximference  «-  f  (circumference/P)]  ; 
while  q  is  not  empty 
temp  ^  remove_top(q) ; 

if  sufficient  space  available  in  cylinder 

t  4-  aohedule_node(temp,  est(temp},  cylinder); 
for  all  descend2mts  of  temp 

est(descendent)  4-max(est(descendent)  ,t  +  w(temp) ; 

else 

cylinder  ♦-  increase^cylsize (cylinder, circumference, w) 
end (if)  " 

end (while) 

headofdepqs  ereate^deps (cylinder, circumference, w) ; 

* 

procedure  aehedule_node (temp , t , cylinder) 
scheduled  ^  false; 
while  not  scheduled 

Attempt  to  insert  at  time  t*  =«  t  mod  circumference; 
/*inseirt  if  space  available  at  that  time  in  cylinder*/ 
if  inserted 

schedule  true; 

else 

t'  «-  (f  +  w(temp))  mod  circumference; 
end (while) 

return  t'; _ 

Figure  5:  An  Algorithm  to  Perform  RC  Assignment 


algorithms  in  the  graph  of  Figure  4.  It  should  be  noted  that  different  schedules  which 
sustain  the  maximal  load  could  be  obtained  for  any  graph.  Not  all  of  the  dependency 
delays  represented  by  the  slim  arrows  will  be  generated  by  the  algorithms.  Any 
assignment  of  nodes  on  the  surface  of  the  cylinder  such  that  no  node  is  preempted,  and 
no  two  nodes  are  mapped  to  the  same  square  is  valid.  Actual  code  implementation  for 
the  RC  algorithms  presented  is  given  in  Appendix  A. 
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irocediire  Cxeat«_d«ps(cvl,circtiin, width) ; 
for  all  AP(1) 's 
t  *-  0; 

vhila  t  <  circiim 
t'  0; 

whila  t*  <  clrcum 

if  ( (index (cyl[i] [t])  -  index (cyl[ i] [f ]) )  and 
(cyl(i][t]  is  not  an  ancestor  of  cyl[i][t']) 
and  (a  dependency  does  not  already  exist  here) ) 
add  a  dependency  for  cyl[i][t]  to  cyl[i][t']; 
t'  ^  t'  +  width; 
end (while) 
t  t  +  width; 
end (while) 
end  (for) 

add  a  dependency  from  a  cyl[i]  [circtua]  to  every  input  node 
return  pointer  to  the  head  of  the  list  of  dependencies 


rigure  6:  An  Algorithm  to  Assign  Dependencies 


I 


Implementation  of  the  RC  database  and  its  incorporation  into  the  AN/UYS-2 
requires  minimal  code  modification.  The  only  addition  to  the  SCH  code  involves  function 
calls  to  set,  adjust,  and  verify  the  tokens  used  to  represent  the  dependencies.  When  the 
originating  node  of  a  dependency  is  due  to  be  scheduled,  the  token  associated  with  that 
node  in  the  dependency  list  is  incremented  one  unit.  Conversely,  when  the  destination 
node  of  a  dependency  is  ready  to  be  scheduled,  the  token  with  that  node  in  the 
dependency  list  is  decremented  one  unit.  Forward  graph  dependenci.»  are  initially  set  to 
zxxo,  and  backward  graph  dependencies  are  initially  set  to  one.  This  enables  the  initial 
nodes  to  execute  at  least  once  prior  to  being  inhibited  by  dependencies.  The  token  code 
is  provided  in  Appendix  A  and  the  SCH  code  modifications  in  Appendix  F. 

2.  Advantages 

The  availability  of  multiple  schedules  which  could  sustain  the  same  throughput 
has  an  important  advantage  with  respect  to  the  AN/UYS-2:  nodes  can  be  grouped  together 
on  the  surface  of  the  cylinder  so  as  to  introduce  optimization  to  minimize  the  loss  of  AP 
cycles  due  to  such  overheads  as  set-up  and  break-down  times.  There  are  several  other 
advantages  of  this  node-AP  assignment  if  a  compile-time  technique  can  be  found  to 
enforce  it  on  the  scheduler  run-time  mechanism: 

•  Compile-time  analysis  of  whether  the  machine  will  meet  the  required  data  rate 
becomes  easy. 

•  Since  the  nodes  are  associated  with  AP's  at  compile-time,  it  becomes  possible  to 
take  into  consideration  optimization  such  as  chaining.  For  example,  although  it  is 
possible  to  assign  nodes  in  the  above  example  in  several  ways,  the  assignment 
shown  enables  chaining  nodes  {a,b,c,d}  together  and  chaining  nodes  {e,f}  together 
to  minimize  the  set-up  and  brrakdown  overheads. 
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•  Once  it  has  been  determined  which  nodes  are  to  be  chained,  the  data  queues  can  be 
allocated  to  GM's  so  that  the  GM  contention  is  minimized. 

•  There  is  no  non-uniformity  in  the  output  generation. 

3.  Disadvantages 

Since  RC  scheduling  keeps  track  of  more  node  relationship  information,  there 
is  an  immediately  higher  overhead  in  this  area.  But,  this  overhead  can  be  absorbed  by 
node  chaining  as  discussed  in  Chapter  VI.  The  requirement  for  the  scheduler  to  support 
on-line  reconfiguration,  typically  performed  by  the  operator  by  replacing  one  or  more 
branches  of  the  PGM  graphs,  is  difficult  to  implement  without  significantly  increasing  the 
complexity  of  the  RC  algorithms.  Yet,  this  appears  to  be  feasible  at  the  macro-language 
level  and  is  an  important  aspect  of  this  approach  deeming  further  investigation. 

4.  A  Simple  Example 

A  RC  schedule  for  the  graph  of  Figure  6  is  shown  in  Table  in.  After  every 
six  clock  cycles,  another  instance  of  the  modified  graph  can  be  overlapped  with  the 
preceding  instance.  Since  the  latencies  for  the  RC  algorithm  are  absorbed  by  the  during 
the  first  six  cycles  of  execution  as  the  cylinder  fills,  the  sporadic  latencies  shown 
previously  in  Table  Q  for  the  FCFS  assignment  are  eliminated  and  a  uniform  output  rate 
is  generated. 

The  remainder  of  this  thesis  concentrates  on  developing  a  simulator  which 
implements  the  AN/UYS-2  architecture,  and  on  analyzing  the  performance  of  the 
Revolving  Cylinder  assignment  algorithm. 
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Tabl«  ZZZ:  RC  ASSIGNMENT  SCHEDULE  FOR  SAMPLE  PGM  GRAPH 


|cycle  #  (i  i  1) 

API 

AP2 

1  6i  -  5 

a(i) 

e(i-l) 

1  6i  -  4 

b(i) 

e(i-l) 

1  6i  -  3 

c(i) 

f(i-l) 

1  6i  -  2 

c(i) 

f(i-l) 

1  6i  -  1 

d(i) 

e(i-l) 

1  6i _ |d(i) 

e(i-l) 
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IV.  THE  SIMULATOR 


This  chapter  examines  the  structure  of  the  AN/UYS-2  simulator  including  the 
language  chosen  and  method  of  implementation. 


A. 


IMFL] 


NTATION 


Before  the  simulator  could  be  developed,  the  method  of  PGM  representation  for  the 
simulator  needed  to  be  determined.  Implementation  of  PGM  on  the  simulator  consists 
of  maintaining  only  the  key  elements  of  the  graph.  The  actual  operation  of  the  AN/UYS- 
2  in  terms  of  DSP  was  not  implemented.  Figure  8  provides  the  recognizable  simulator 
input,  neglecting  the  columnar  headings,  for  the  PGM  graph  of  Chapter  HI,  Figure  4. 
The  graphical  description  should  be  located  in  an  American  Standard  Code  for 
Informatimi  Interchange  (ASCII)  Hie  'graph.'  Data  structure  for  the  graph  of  Figure  8 
ctmsists  of  two  parallel  queue  structures:  a  node  queue,  which  contains  pointers  to  its 
input  and  output  queues,  and  a  queue  queue  thereby  yielding  the  nee-like  structure  shown 
in  Figure  9.  The  gnodes,  'a'  through  'f,  shown  in  Figure  9  contain  all  of  the  node 
information  listed  in  Figure  8,  and  the  gqueues,  '1'  through  '9',  contain  all  of  the  queue 
information  documented  in  Figure  8.  The  arrows  shown  in  Figure  9  represent  pointers 
which  relate  the  graph  infrastructure.  Actual  code  implementation  for  input  of  the  graphs 
and  some  of  the  necessary  simulator  access  to  the  graphical  data  is  documented  in 
Appendix  B. 
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Figor*  8:  Graphical  Input  Format  for  the  simulator 


in  -  Pointers  to  input  Queues 
out  -  Pointers  to  output  Queues 


Figure  9:  Data  Structxire  Representation  of  a  PGM  Graph 
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Tnitial  design  theories  revolved  around  emulating  the  virtual  machine.  However, 
the  communication  complexities  of  the  AN/UYS-2  virtual  machine,  dictated  that  the 
physical  layout  of  the  ANAJYS-2  provide  a  reasonable  method  of  structuring  the  software 
simulator.  *  The  software  simulator  implements  a  subset  of  the  architecture  and 
instruction  set  while  maintaining  operational  similarities.  The  implemented  architecture 
and  instruction  set,  which  is  loosely  based  on  the  et5+  +  simulator  outlined  in  ECOS,  is 
demonstrated  in  Figure  10  [ECOS  89,  pp.  4-1  -  4-37].  Each  major  box  in  Figure  10 
represents  a  resource  element  implemented  in  the  simulator.  Each  set  of  abbreviations 


^  Shown  previously  in  Chapter  II,  Figure  1. 
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associated  with  the  arrows  represent  mess^es  passed  between  resource  elements.  The 
arrows  themselves  indicate  the  direction  of  message  flow  on  either  the  CBUS  or  the  DTN 
dq)aiding  on  the  message. 

1.  CtHomumcatioiis 

Like  all  distinct  entities  in  the  universe  and  the  actual  AN/UYS-2,  separate 
objects  must  communicate  with  one  another.  The  two  main  communication  nets,  DTN 
and  CBUS,  are  implemented  as  distinct  objects  themselves  within  the  simulator.  The 
CBUS  is  represented  as  a  data  structure  called  objea  and  a  boolean  variable.  The 
boolean  variable  represents  the  status  of  the  resource  elements  to  which  the  CBUS  is 
attempting  to  communicate.  The  objea  structure  contains  information  about  the 
following:  the  object  identification  number,  the  time  until  which  the  object  is  busy,  the 
processing  status  of  the  object,  the  transfer  status  of  the  object,  a  pointer  to  the  current 
objeanode  being  executed  by  the  object,  a  pointer  to  the  head  of  the  object’s  input  queue, 
and  a  pointer  to  the  tail  of  the  object's  input  queue.  The  objeanode  consists  of  a  message 
instruction,  the  graphical  node  identification  number  associated  with  that  message,  the 
graphical  queue  identification  number  associated  with  the  node,  the  message's  origin,  the 
message's  destination,  and  the  message’s  associated  location.  The  DTN  is  represented 
by  an  array  of  sixteen  objea  structures  as  discussed  earlier,  each  with  a  boolean  variable. 
Each  of  the  distinct  paths  established  is  assumed  to  be  capable  of  only  one  transfer  at  a 
time. 

The  main  simulator  program  polls  the  appropriate  CBUS  or  DTN  function 
processbus.  Processbus  determines  if  this  path  is  waiting  for  a  transfer  to  be  completed 
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at  the  desired  transfer  locations.  If  it  is  then  no  further  action  is  taken  this  clock  cycle. 
Otherwise,  processbus  conducts  the  requested  message  transfer  by  calling  the  appropriate 
functions  commencarfer  and  completafer,  then  processes  the  next  message  at  the  head 
of  its  input  queue.  Commencc^er  and  completcrfer  simulate  the  required  time  delay 
associated  with  the  message  transfer  and  place  the  message  in  the  destination  objea's 
queue.  The  data  rate  is  assumed  to  be  the  ANAJYS-2  data  maximum  of  seven  mega- 
words  per  second.  Code  for  the  transfer  aspects  of  the  simulator  has  not  been  included 
for  brevity. 

Since  the  ANAJYS-2  instruction  set  consists  of  over  100  messages,  a  partial 
instruction  subset  consisting  of  only  the  most  relevant  ones  was  chosen  [POPS  90,  pp. 
1-1  - 10-5].  Write  Queue  (WQ)  writes  data  generated  at  an  AP  by  a  DSP  function  to  the 
CM  responsible  for  that  queue.  It  incites  threshold  and  capacity  crossing  information 
within  the  GM,  spawning  Queue  Over  Threshold  (QOT),  Queue  Over  Capacity  (QOC), 
and  Queue  Under  Capacity  (QUQ  messages  [POPS  90,  pp.  7-46].  Queue  Over 
Threshold  is  the  message  by  which  the  GM’s  inform  the  SCH  that  a  queue  has  gone  over 
threshold.  It  causes  the  SCH  to  check  all  of  the  queues  associated  with  the  node  affiliated 
with  the  original  queue  to  see  if  all  are  over  threshold.  If  they  are,  it  precipitates  an 
Execute  Instruction  Stream  (HS)  or  Send  Instruction  Stream  (SIS)  from  the  SCH  to  either 
an  lOP  or  GM  [POPS  90,  pp.  6-44].  QOC  is  sent  by  the  GM’s  to  apprise  the  SCH  that 
a  queue  has  gone  over  capacity,  thereby  causing  the  SCH  to  suspend  further  execution 
of  the  node  feeding  the  queue  until  the  queue  has  returned  under  capacity  [POPS  90,  pp. 
6-43].  QUC  is  sent  by  the  GM's  to  apprise  the  SCH  that  a  queue  has  proceeded  under 
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capacity.  This  causes  the  SCH  to  continue  execution  of  the  nodes  feeding  this  queue 
[POPS  90,  pp.  6-46].  SIS  is  the  message  sent  by  the  SCH  to  a  GM  initiating  the  transfer 
of  a  node’s  instruction  stream  from  the  GM  to  the  AP  indicated  in  the  message.  SIS 
causes  the  GM  to  issue  an  Accept  Instruction  Stream  (AIS)  to  the  AP  [POPS  90,  pp.  7- 
38].  AIS  is  the  way  by  which  the  GM  communicates  the  instruction  stream  to  the  AP 
causing  it  to  load  and  execute  the  instruction  stream.  This  execution  includes  sending  the 
Read  Queue  (RQ)  requests  back  to  the  GM's  for  necessary  queue  data  [POPS  90,  pp.  9- 
22].  RQ  is  sent  by  the  AP  to  initiate  a  transfer  of  data  &om  a  queue  in  the  GM  to  the 
requesting  AP.  This  message  prompts  the  GM  into  sending  an  Accept  Queue  (AQ) 
instruction  to  the  AP  which  includes  the  required  queue  data  [POPS  90,  pp.  7-39].  AQ 
is  sent  from  the  GM  to  the  requesting  AP  in  order  to  transport  the  required  data.  This 
causes  the  AP  to  accept  and  store  the  specified  queue  data  which  was  requested  with  an 
RQ  [POPS  90,  pp.  5-131,  9-26].  When  the  AP  is  ready  to  accept  the  next  instruction 
stream,  it  sends  a  Ready  For  Instruction  Stream  (RFIS)  to  the  SCH.  This  causes  the  SCH 
to  attempt  to  generate  a  match  with  a  ready  node  if  one  is  available,  if  not,  then  the  free 
AP  identification  number  is  added  to  the  free  AP  list  [POPS  90,  pp.  6-49],  After  the  AP 
has  completed  breakdown  of  the  last  node  to  execute,  it  sends  a  Consume  Queue  (CQ) 
to  the  GM  that  maintains  the  input  queues  to  that  node.  This  causes  the  GM's  to  free  the 
memory  that  had  been  used  to  maintain  that  data  [POPS  90,  pp.  7-43].  EIS  is  sent  from 
the  SCH  to  the  lOP's  when  the  queues  over  threshold  belong  to  a  self-regulated  or  sink 
node  [POPS  90,  pp.  10-24]. 
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2.  Msyor  Resource  Elonents 


Message  passing  among  the  FE's  and  modular  functionality  demanded  that  they 
be  implemented  as  unique  objects.  The  simulator  expects  the  AN/UYS-2  structure  to  be 
in  lOP,  GM,  AP  numerical  order  in  a  created  ASCII  file  'EMSPSTRU. '  For  the  SCH 
and  each  lOP,  GM,  or  AP  specified,  the  simulator  generates  an  object  in  memory.  Each 
major  dement  is  represented  as  an  object  with  the  exception  of  the  ISC,  which  is  treated 
as  an  lOP  element.  The  simulator  requests  the  user  to  input  the  type  of  simulation 
desired:  FCFS  or  RC,  and  the  last  instance  of  the  graph  that  the  user  wishes  to  examine. 
If  RC  is  selected,  the  cylinder  is  created,  and  dependencies  assigned.  The  expected 
number  of  simulated  micro-seconds  are  calculated,  and  the  main  simulation  begins  by 
repeatedly  polling  in  order  lOP's,  CBUS,  DTN,  GM’s,  SCH,  and  AP’s.  Each  loop 
count  represents  one  simulated  micro-second. 

Code  implementation  of  the  main  simulator  program  and  some  of  the  common 
object  code  is  located  in  Appendix  C.  Simulator  C++  constructor  and  destructor  code 
and  self  explanatory  functions  are  not  included  for  brevity.  Constructors  generate  the 
initialization  specified  by  the  programmer.  Destructors  delete  the  pointers  designated  by 
the  programmer  and  free  the  memory  storage. 

a.  The  Input/Ou^ut  Processor 

The  lOP  data  structure  is  represented  by  another  data  structure  called  an 
objea,  a  pointer  to  a  list  of  nodes  assigned  to  this  lOP,  and  a  pointer  to  the  next  lOP. 
Upon  initial  graph  loading,  each  external  input  or  output  node  is  assigned  to  an  lOP  by 
the  graph  loading  function,  loadgraph.  Each  time  through  the  main  simulator  loop,  the 
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lOP  function,  processiop,  is  polled.  Processiop  generates  the  external  input  queues' 
Transfers  via  the  DTN  are  accomplished  when  the  data  period  specified  in  the  input  graph 
for  each  external  input  queue  is  met  as  determined  by  the  time  of  the  simulated  clock. 
Once  the  period  is  satisfied,  a  WQ  instruction  is  generated  and  sent  via  the  DTN  to  the 
GM  which  was  designated  to  retain  that  queue's  information.  For  each  external  output 
node  assigned,  the  lOP  acts  to  execute  the  node  and  generate  external  output  queue  data 
whoi  specified  by  a  EIS  mess^e  horn  the  SCH.  No  actual  implementation  of  inpu,  or 
output  is  performed.  Code  dealing  with  the  lOP  implementation  including  all  pertinent 
functions  necessary  to  access  the  lOP  structures  appears  in  Appendix  D. 

b.  The  Global  Memories 

The  simulator  GM's  do  not  store  the  actual  queue  data  involved  in  node 
processing.  The  GM  data  structure  is  represented  by  the  data  structure  objea  and  a 
pointer  to  the  next  GM.  Each  time  through  the  main  simulator  loop,  the  GM  function, 
processgm,  is  polled.  Through  its  objea  oriented  sub-fimction  processgmnode,  processgm 
simulates  the  GM  execution  as  follows.  Since  queue  sizes  only  change  as  messages  are 
processed,  the  simulator  GM's  perform  their  queue  size  determination  when  WQ  or  CQ 
messages  are  received  and  generate  the  appropriate  QOC,  QUC,  or  QOT  message  to  the 
SCH  after  inserting  the  appropriate  execution  delay  time.  GM  messages  like  RQ  or  SIS 
result  in  redirection  and  time  determination  and  delay  only.  Pertinent  Implementation 
code  for  the  GM's  appears  in  Appendix  E. 
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c.  The  Scheduler 


The  SCH  data  stnicmre  consists  of  the  following:  an  objea  data 
structure,  a  pointer  to  a  list  of  free  AP*s,  a  pointer  to  a  list  of  ready  nodes,  a  pointer  to 
a  list  of  nodes  currently  executing  on  AP's,  and  a  pointer  to  a  list  of  inhibited  nodes. 
The  simulator  scheduler  emulates  the  AN/UYS-2  scheduler  by  acting  on  the  following 
instructions:  RFIS,  QOC,  QUC,  and  QOT.  Upon  receipt  of  a  RFIS  message  from  an 
AP,  the  free  AP  list  is  updated.  A  QOC  message  results  in  appending  the  node  which 
supplies  the  queue  associated  with  the  QOC  message  to  the  inhibited  list.  When  a  QUC 
message  is  received,  the  node  in  question  is  removed  from  the  inhibited  list.  The  ready 
list  is  not  affected  since  the  node  is  already  in  this  list.  The  simulator  SCH  maintains  a 
ready  node  list  which  is  annotated  anytime  a  QOT  message  is  received  from  the  GM’s 
that  results  in  all  queues  for  the  node  associated  with  the  message  going  over  threshold. 
This  section  of  the  function  processsch  forms  the  meat  of  the  simulator  SCH.  For  the 
FCFS  case,  the  SCH  matches  the  first  entry  in  the  free  AP  list  with  the  first  entry  in  the 
ready  node  list.  Once  the  match  is  made,  that  ready  node  is  placed  in  the  executing  list 
to  inhibit  its  execution  until  its  current  execution  has  completed.  In  the  RC  case,  the 
SCH  updates  the  ready  node  list  and  free  AP  list  as  above,  but  before  a  match  is  made, 
the  ready  node's  RC  tokens  are  checked  for  satisfactory  conditions.  Only  once  tokens 
are  satisfied  is  a  match  allowed  to  proceed.  Actual  code  implementation  of  the  SCH 
appears  in  Appendix  F. 
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d.  The  Arithmetic  Pmcessors 

The  AP  data  structure  is  represented  by  three  distinct  object  data 
structures,  the  number  of  set-up  nodes  involved,  the  number  of  breakdown  nodes 
involved,  the  AP  set-tp  status,  and  the  AP  breakdown  status.  Three  distinct  object  data 
structures  were  chosen  due  to  the  AP's  ability  to  simultaneously  perform  set-up, 
breakdown,  and  execution  on  different  nodes.  The  main  AP  function  processap,  which 
is  polled  at  every  iteration  of  the  main  program,  performs  the  breakdown,  execution,  and 
set-up  status  checks  and  forwards  the  node  message  information  between  stages.  The 
AP's  simulate  the  actual  node  setip,  breakdown,  and  execution  by  entering  delay  cycles 
a  corresponding  to  that  nodes  execution  data  loaded  at  simulator  run-time.  Upon  the 
transfer  of  information  between  the  set-up  and  execute  stages,  which  implies  that  a  new 
node  is  now  being  executed,  the  AP  issues  a  RFIS  message  to  the  SCH.  Through  its  sub- 
fmcdon processt^node,  processap  performs  the  message  actions  and  redirection  required. 
The  AP's  act  as  receptors  for  the  AIS  message  sent  by  the  GM’s.  Upon  receipt  of  the 
AIS  message,  the  set-up  stage  of  the  AP  determines  the  number  of  input  queues  required 
and  issues  that  many  RQ  instructions  to  the  GM's  associated  with  the  AIS  message. 
Upon  receipt  of  an  AQ  message  from  the  GM’s,  the  AP  set-up  stage  updates  its  delay 
time  and  waits  for  the  last  AQ  message  that  it  is  expecting  for  that  node.  After  receiving 
the  last  AQ  message,  it  executes  a  transfer  of  information  to  the  execute  stage  if  the 
execute  stage  is  not  busy.  Otherwise  it  waits  until  it  is  able  to  perform  the  transfer. 
When  the  breakdown  stage  receives  the  information  from  the  execute  stage,  it  generates 
a  WQ  instruction  for  every  output  queue  associated  with  the  node  that  was  executed  and 
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generates  a  CQ  instruction  for  every  input  queue  associated  with  that  node. 
Implementation  code  for  the  AP's  appears  in  Appendix  G. 

B.  THE  LANGUAGE  -  C+  + 

During  the  1980*s,  the  C  programming  language  matured  through  the  addition  of 
classes,  type  checking  and  conversions,  virtual  functions,  and  operator  overloading  until 
a  new  language,  C++,  resulted  [STROUSTRUP  86,  pp.  1-12,  POHL  90,  pp,  5].  "Since 
its  conception,  C+ +  has  been  evolving  to  meet  the  needs  of  its  users."  [STROUSTRUP 
90,  pp.  3]  C++  has  endured  much  use  on  large  software  projects.  Its  stability, 
compatibility,  space-efficiency,  and  run-time  features  have  been  strongly  documented  by 

Ellis,  Stroustrup,  and  Pohl  [POHL  89,  pp.  1-25]. 

* 

Since  the  ANAJYS-2  and  C+  +  2.0  are  products  of  AT&T  Bell  Laboratories, 
C++  was  considered  a  forerunner  in  the  choice  of  programming  languages.  C++  was 
chosen  to  match  the  modularity  of  the  AN/UYS-2.  The  reasons  for  which  C+ +  2.0  was 
chosen  over  Ada  include: 

•  Object  Oriented  Programming 

•  Encapsulation 

•  Inheritance. 

Object  oriented  programming  allows  abstract  data  types  to  be  constructed  so  as  to 
allow  the  programmer  to  model  the  problem  domain  within  a  class  structure.  A  class  can 
be  thought  of  as  an  extension  of  the  idea  struct  from  traditional  C.  The  AN/UYS-2's 
distinctive  modularity  of  components  lends  itself  precisely  to  programming  with  objects. 
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Encapsulation  refers  to  C++ 's  ability  to  completely  self-enclose  both  the  internal 
implementation  details  of  the  type  and  functions  that  act  on  objects  of  that  type  from  the 
extmnal  user  [POHL  89,  pp.  2,211].  Encapsulation  ensures  that  unqualified  simulator 
personnel  will  not  be  able  to  access  or  blindly  modify  the  simulator  objects. 

Stroustrup  defines  inheritance  as  the  mechanism  of  deriving  a  new  class  from  an 
existing  base  class  [STROUSTRUP  90,  pp.  2].  Due  to  the  hierarchical  structure  of  the 
ANAJYS-2,  inheritance  guarantees  that  key  conceptual  ideas  remain  intact  from  objea-to- 
objea. 

C.  USER  INTERFACE 

To  execute  the  simulator  the  user  simply  invokes  the  executable  version  and  follows 
the  on-screen  prompts.  The  simulator  collects  data  about  the  individual  graph  nodes. 
The  time  that  sink  nodes  are  sent  to  the  lOP's  are  output  to  a  file  'results'.  The  time  that 
an  instance  is  placed  onto  the  ready  list  and  the  time  at  which  an  instance  is  removed 
from  the  ready  list  are  used  to  calculate  the  time  an  instance  spends  on  the  ready  list. 
This  time  gives  a  good  indication  of  the  delay  time  involved  in  scheduling  a  node.  The 
time  that  a  node  spends  between  breakdown  completion  and  its  successor  node  being 
placed  on  the  ready  list  is  calculated  to  provide  an  indication  of  communication  delay. 
Similarly,  the  time  between  a  node  being  removed  from  the  ready  list  until  it  arrives  at 
an  AP  also  provides  information  about  communication  delay.  The  time  which  the  AU 
portion  of  the  AP  spends  not  busy  is  calculated  to  provide  details  about  set-up  and 
breakdown  delay  times.  This  information  is  provided  to  the  user  on  a  screen  display 
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upon  completion  of  the  simulation  for  the  node  instances  specified  by  the  user.  Currently 
only  a  span  of  twenty  node  instances  can  be  examined  per  simulation  run. 

D.  LIMITATIONS  OF  THE  SIMULATOR 

Like  any  full  scale  simulator,  this  AN/UYS-2  simulator  is  not  intended  to  be  of  all 
inclusive  design.  The  simulator  lOP  assumes  that  an  unlimited  buffer  is  available  for 
incoming  data.  Execution  of  self-regulated  nodes  in  lOP's  is  not  implemented.  The 
ISC's  are  simplified  to  lOP's  for  the  simulation.  The  initial  assignment  of  queues  to 
GM's  is  assumed  to  be  completely  known  upon  initialization.  All  input  queues  associated 
with  a  node  are  assigned  to  the  same  GM  by  the  simuhuor.  Unlike  the  actual  ANAJYS- 
2,  the  simulator  is  severely  restricted  by  the  amount  of  memory  available  on  which  to  run 
it.  While  the  simulator  is  written  to  be  ported  between  any  machine  capable  of  C+ -f 
version  2.0,  large-grain  data-flow  graphs  quickly  fiU  available  memory  and  disk  space. 
The  time  required  to  perform  each  simulation  discussed  in  Chapter  V  exceeds  four  hours 
on  a  33-mega-hertz  80486  (16.11  Million  Instructions  Per  Second). 
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V.  PERFORMANCE  EVALUATION 

As  mentioned  previously,  a  digital  signal  processor's  performance  can  be  measured 
by  throughput.  Unfortunately,  Ae  AN/UYS-2*s  throughput  is  not  based  solely  on  its 
scheduling  algorithm.  The  number  of  nodes  and  queues  in  the  graph  and  their  sizes 
omtribute  to  the  oveihead  involved.  For  this  reason  several  types  of  application  graphs 
were  chosen  for  analysis  of  the  AN/UYS-2  performance  as  it  pertains  to  the  scheduling 
algorithms.  All  of  the  results  obtained  are  based  on  varying  the  input  data  rate. 

A-  CORRELATOR  APPUCATION 

The  initial  correlator  application  was  chosen  to  provide  an  actual  on-hands  non- 
uniform  ECOS  graph  as  outlined  by  ECOS  [ECOS  90,  pp.  H-CG-5  -  H-CG-27], 

1.  Description 

The  graphical  diagram  for  this  graph  appears  in  Figtire  1 1 .  Again,  the  circles 
indicate  primitive  nodes  to  be  executed  and  the  arrows  represent  the  queues  that 
manipulate  the  data  required  by  the  nodes.  ”T"  represents  the  threshold  value  required 
for  that  queue  before  scheduling  of  the  subsequent  node  can  be  arranged.  "R"  represents 
the  amount  that  is  read  by  the  subsequent  node  on  execution  set-up.  "C"  represents  the 
amount  that  is  consumed  on  subsequent  node  breakdown.  "P"  represents  the  production 
amount  from  the  previous  node.  Actual  execution  times  for  the  primitives  listed  beside 
the  nodes  were  calculated  by  use  of  the  Graph  Primitives  Library  and  are  included  with 
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rigur*  11:  Graphical  Description  of  Correlator  Application 

the  primitive  name.  The  graph  would  be  input  to  the  simulator  after  performing  the 
manual  conversion  to  the  standard  numerical  format  described  in  Chapter  IV. 


2.  Output  and  Interpretation 

The  points  obtained  for  the  graphs  plotted  in  the  case  of  the  correlator  graph 
were  taken  at  five  percent  intervals  except  in  the  region  of  close  similarity  where  the 
interval  was  one  percent.  The  normalized  input  data  interval  refers  to  the  theoretical 
maximum  throughput  rate  for  the  application  assuming  no  internal  delays.  As  discussed 
in  Chapter  m,  the  theoretical  maximum  throughput  rate  is  based  on  the  total  execution 
time  for  the  graph  divided  by  the  number  of  AP's  in  the  configuration.  Therefore,  the 
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normalized  input  data  interval  axis  is  based  on  interval  times  greater  than  the  theoretical 
maximum  of  1.0.  Normalized  mean,  which  refers  to  the  mean  time  between  output 
arrivals  divided  by  the  theoretical  input  data  interval,  results  are  shown  in  Figure  12  and 
Figure  13.  The  closer  the  value  to  uniformity,  the  better  the  performance.  While  not 
discernible  in  Figure  12,  Figure  13  clearly  indicates  that  the  RC  algorithm  reaches  unity 
five  percent  before  the  FCFS  algorithm.  At  all  times  the  RC  curve  remains  below  the 
FCFS  curve  on  the  graph.  The  correlator  graph  normalized  standard  deviation  is  shown 
in  Figure  14.  The  normalized  standard  deviation  curve  indicates  that  the  RC  algorithm 
provides  a  more  uniform  output  than  does  the  FCFS  algorithm  throughout  the  range  of 
input  data  intervals.  Figure  IS  demonstrates  this  output  arrival  by  plotting  the  normalized 
completion  time  for  the  first  thirty  graph  instances  observed.  Due  to  the  dependencies 
inserted  by  the  RC  algorithm,  the  AP  efficiency  is  lower  for  the  RC  case  than  for  the 
FCFS  case  until  uniformity  in  output  is  obtained  as  shown  in  Figure  16.  This  result  is 
caused  by  the  dependencies  inhibiting  the  earlier  nodes  in  the  graph  from  executing  until 
they  ^  satisfied. 

While  the  AP  efficiency  is  slightly  lower  for  the  RC  approach,  the  lower 
normalized  mean  and  standard  deviation  results  indicate  a  slight  improvement  by  use  of 
the  RC  algorithm  over  the  current  scheduling  technique.  The  output  times  associated  with 
the  thirtieth  observed  graph  instance  also  bears  out  this  analysis. 
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Plgtu:*  15:  Correlator  Graph  Normalized  Output  Completion  Time 
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7igur«  16:  Correlator  Graph  AP  Efficiency 


B.  CORRELATOR  WITH  UNIFORM  NODE  SIZES 
1.  Description 

To  examine  the  effect  of  varying  execution  tinges  on  the  nodes  in  the  application, 
the  same  grs^hical  structure  was  maintained,  but  the  execution  times  of  the  nodes  were 
changed  to  be  uniform  as  shown  in  Figure  17.  The  uniform  size  of  36000  micro-seconds 
per  node  was  selected  based  on  maintaining  the  same  overall  execution  time  of  the  graph. 
It  was  assumed  that  primitives  could  be  restructured  to  meet  this  size  without  major 
revisions  to  queue  sizes  required. 
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Correlator  Graph  structure  with  Uniform  Sized 


2.  Output  and  Interpretation 

The  normalized  means  for  the  correlator  graph  structure  with  equal  execution 
time  nodes  is  shown  in  Figure  18  and  Figure  19.  It  can  be  seen  that  as  long  as  the  input 
data  rate  is  not  being  met,  the  RC  algoritiim  performs  better  than  the  FCFS  algorithm. 
But,  botii  the  RC  and  FCFS  algorithms  reach  the  ability  to  meet  the  input  rate  at  the  same 
time.  The  normalized  standard  deviation  curve  is  shown  in  Figure  20.  The  RC  standard 
deviation  never  rises  above  1.0,  Therefore,  the  RC  algorithm  produces  a  more  uniform 
output  than  does  that  of  the  FCF*'  algorithm.  Figure  21  demonstrates  that  as  long  as  the 
input  data  rate  is  not  being  met,  the  RC  algorithm  completes  thirty  graph  instances  before 
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Figure  18:  Correlator  Structtire  Means  for  Equal  Node  Times 


Figure  19:  Correlator  Structure  Means  Blow  Up  with  Equal 

Times 
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Normalized  Completion  Fime  lor  30  Inatoncea  I  ih  I  Normoized  Stondord  Deviation 


or*  20:  Correlator  Structure  Deviation  with  Equal  Times 


Figure  21:  Equal  Correlator  Structure  Output  Arrival  Times 
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the  FCFS  algorithm.  The  percent  AP  efficiency  plotted  in  Figure  22  is  the  same  for  both 
the  RC  and  FCFS  scheduling  algorithms. 

Since  the  AP  efficiency  is  the  same  for  both  the  RC  and  FCFS  cases,  the 
dependencies  inerted  by  the  RC  algorithm  do  not  result  in  slowing  the  AP  utilization 
rate.  While  this  is  an  improvement  over  the  first  correlator  graph  examined,  the 
normalized  means  for  the  correlator  structure  with  equal  size  nodes  reach  the  1.0  value 
at  precisely  the  same  instance,  65%,  tending  to  indicate  no  improvement.  However,  the 
normalized  standard  deviation  for  the  RC  approach  never  varies  from  0.0.  This  indicates 
a  perfectly  uniform  ouq)ut  regardless  of  the  load  level,  including  a  five  percent  band  after 
both  the  RC  and  FCFS  normalized  means  reach  1.0.  Again,  this  indicates  a  slight 
improvement  for  the  RC  case. 


PCFS  Oaah«d.  RC  Sclid,  for  Modified  2  Correlator  Graph 


Figure  22:  AP  Efficiency  for  Correlator  Structure  Equal  Times 
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C.  CORRELATOR  WITH  CHAINED  NODES 


In  order  to  examine  the  effect  of  the  graph  structure  as  compared  to  the  uniform 
sizes  of  the  node,  the  original  correlator  graph  was  chained  to  uniform  node  sizes.  This 
chaining  of  nodes  resulted  in  a  different  graph  structure  with  different  sized  nodes.  All 
of  the  queue  information  is  assumed  to  remain  consistent. 

1.  Description 

The  graphical  representation  of  the  correlator  graph  is  shown  in  Figure  23. 
The  execution  times  for  the  nodes  are  documented  beside  them.  This  graph  was  derived 
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from  the  basic  correlator  graph  by  again  minimizing  the  variation  in  total  execution  time. 
However,  this  graph  possesses  a  different  graphical  structure  from  the  correlator  graph 
with  equal  execution  times  because  in  this  graph  the  actual  individual  primitives  have 
been  chained  and  segregated  as  best  that  could  be  determined. 

2.  Output  and  Interpretation 

The  normalized  means  are  shown  in  Figure  24  and  Figure  25.  Both  the  RC 
and  FCFS  algorithms  possess  the  same  normalized  means  even  before  the  input  data  rate 
is  met.  Figure  26  shows  the  normalized  standard  deviation.  While  the  RC  algorithm 
maintains  at  least  as  well  a  normalized  standard  deviation  as  the  FCFS  algorithm,  the 
relative  closeness  of  the  numbers  precludes  any  explicit  determination.  The  normalized 
observed  instance  completion  times  for  thirty  graph  instances  are  shown  in  Figure  27. 


Figuru  24:  Correlator  Graph  Structure  Mean  with  Chained  Nodes 
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Figiura  26;  Correlator  Graph  Structure  Deviation  Chained  Nodes 
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FCFS  Dash«d,  RC  Solid,  for  Modified  Correlator  Graph 


rigur«  27:  Co.rrelator  Graph  Structure  Output  Times  Chained 

Figure  28  demonstrates  a  slight  AP  efficiency  difference  between  RC  and  FCFS 
algorithms. 

Again,  once  the  input  data  rate  is  close  to  being  met,  the  RC  and  FCFS 
algorithms  perform  the  same.  Yet,  for  data  rates  which  are  not  met,  the  RC  algorithm 
completes  the  instances  sooner  than  does  the  FCFS  algorithm  with  a  lower  normalized 
standard  deviation.  But,  unlike  the  correlator  graph  or  the  correlator  graph  with  equal 
nodes,  there  is  no  five  percent  improvement  overlap  seen  in  the  mean  and  standard 
deviation.  Therefore,  it  appears  that  once  the  input  data  rate  is  met,  chaining  performs 
equally  as  well  as  the  RC  approach. 
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Normalizad  Input  Data  Interval 


7igur«  28:  Correlator  Graph  Structure  AP  Efficiency  Chained 


D.  FFT  APPUCATION 

A  Fast  Fourier  Transform  (FFT)  Application  was  chosen  to  examine  the  effects  of 
the  RC  scheduling  algorithm  on  a  large  scale  communication  intensive  interconnected 
application.  A  communication  intensive  application  is  characterized  by  large  queues 
resulting  in  high  communication  overheads  on  the  DTN  and  CBUS  for  message  passing. 
This  typically  results  in  low  overall  AP  efficiency. 

1.  Description 

The  data-flow  graph  for  a  two  dimensional  (2-D)  FFT  can  be  represented  in 
terms  of  that  of  a  one  dimensional  (1-D)  FFT.  This  application  assumes  a  256  point 
vector  of  inputs.  The  1-D  FFT  shown  in  Figure  29  can  be  calculated  in  log  256  (8) 
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stages  of  butteifty  operations  with  128  butterflies  per  stage.  Each  stage  can  be  divided 
into  p  parallel  tasks,  with  256/2p  butterflies  per  task.  As  the  tasks  in  stage  i  finish,  they 
sold  their  outputs  to  the  tasks  in  stage  (i+i).  The  data-flow  graph  for  a  2-D  FFT  uses 
2log  256  (16)  stages  to  transform  a  256  x  256  matrix  of  inputs.  256  1-D  FFT's  are 
computed  for  rows  followed  by  another  256  1-D  FFT's  for  columns.  Tasks  in  the  first 
8  stages  perform  1-D  FFT's  on  all  256  rows  with  each  task  performing  256^/2p 
butterflies.  Tasks  in  stage  log  256  send  data  to  tasks  in  stage  (8  +  1)  in  such  a  way  that 
the  second  set  of  8  stages  performs  256  colunm  transforms.  The  numbers  beside  the 
queues  represent  tpreue  over  threshold^  production^  and  consume  values  in  micro-seconds. 
[SHUKLA  90,  pp.  48-51] 

2.  Output  and  Interpretation 

The  normalized  FFT  means  are  shown  in  Figure  30  and  Figure  3 1 .  Here  also, 
the  input  data  rate  is  met  for  the  RC  algorithm  five  percent  before  that  of  the  FCFS 
algorithm.  The  input  data  rate  is  not  met  until  further  down  in  the  percentage  range  due 
to  the  high  communication  overhead  involved  with  this  graph.  The  normalized  standard 
deviations  are  shown  in  Figure  32  and  Figure  33.  Again,  clearly  the  RC  standard 
deviation  out  performs  the  FCFS  standard  deviation  throughout  the  spectrum  of  input  data 
rates.  Also  note  the  consistency  that  exists  in  the  RC  normalized  standard  deviation 
across  the  spectrum.  Figure  34  documents  the  observed  normalized  completion  times  for 
the  first  thirty  graph  instances.  The  RC  algorithm  generates  the  results  quicker  than  the 
FCFS  algorithm.  Figure  35  demonstrates  the  differences  in  AP  efficiency  for  the  FFT 


Normoized  Stondord  DevioUon  I  ib  I  Normalized  Stondord 


ax«  32:  FFT  Stemdard  Deviation 


Figure  33:  FFT  Standard  Deviation  Blow  Up 
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X  AP  Efficiency  I  «b  I  Normolized  Completion  Time  for  30  Instances 


FCFS  Oo'hed.  RC  Solid,  for  64  Noda  Groph 


iar«  34:  FFT  Normalized  Instance  Completion  Output 


Figure  35:  FFT  AP  Efficiency 


graph.  The  low  values  are  caused  by  the  communication  overhead  involved  in  processing 
this  type  of  graph. 

The  RC  approach  possesses  a  greater  AP  efficiency  due  to  the  assigned 
dependencies  limiting  the  communication  traffic  on  the  DTN  and  CBUS.  The  RC 
a{^)roach  normalized  mean  reaches  unity  five  percent  before  the  FCFS  normalized  mean. 
Additionally,  the  normalized  standard  deviation  is  consistently  less  than  0.5  regardless  of 
load  level.  This  implies  that  a  much  more  uniform  output  results  from  the  RC  algorithm 
regardless  of  load.  Since  the  observed  normalized  completion  time  for  thirty  graph 
instances  is  also  less  for  the  RC  approach  than  the  FCFS  approach,  it  can  be  concluded 
that  the  RC  approach  performs  well  with  communication  intensive  applications. 
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VI.  CONCLUSIONS 


Due  to  the  increased  complexity  of  Navy  sensor  systems  and  the  increased  signal 
processing  requirements  expected  within  the  next  ten  years,  a  different  scheduling 
approach  may  extend  the  lifetime  of  the  current  ANAJYS-2  hardware  without  major 
expense. 


A.  SUCCESS  OF  THE  RC  APPROACH 

The  performance  results  obtained  using  a  simulation  of  simple  applications  indicate 
the  effectiveness  of  the  proposed  technique  in  impro'/ing  performance  and  predictability 
for  communication  bound  graphs. 

1.  Communication  Intensive 

Communication  intensive  applications  such  as  the  sixty>four  node  FFT  graph 
analyzed  in  Chapter  V  document  an  extensive  improvement  by  use  of  the  RC  algorithm 
over  that  of  FCFS  scheduling.  The  RC  approach  produces  a  more  uniform  throughput 
while  improving  the  AP  efficiency  and  limiting  the  communication  overhead. 

2.  Non-Communication  Intensive 

The  non-communication  intensive  applications,  like  the  correlator  graph 
analyzed  in  Chapter  V,  demonstrate  improvement  dependent  on  the  node  execution  times. 
For  nodes  with  uniform  execution  times,  no  substantial  improvement  was  noted.  But  for 
non-uniform  execution  times,  quicker,  more  uniform  throughput  was  observed. 
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Since  chaining  of  node:,  is  a  matter  of  trial  and  error  in  FCFS  scheduling,  and 
unrestrained  chaining  results  in  loss  of  parallelism  which  can  be  detrimental  to  uniform 
AP  utilization  and  throughput,  the  RC  schedule  offers  a  reasonable  approach  that  would 
limit  the  work  of  the  application  programmer. 

B.  IMPROVING  RC 

If  chaining  is  specified  within  the  framework  of  RC  assignment,  its  effect  can  be 
accurately  predicted.  An  algorithm  to  use  chaining  in  RC  assignment  is  being  developed. 
Since  the  cylinder  creates  a  semblance  of  assigning  nodes  to  AP's  even  though  it  is  not 
enforced,  use  of  this  knowledge  can  absorb  even  the  communication  overheads  involved 
between  nodes. 

Given  a  specific  assignment,  it  is  known  which  queues  are  accessed  at  the  same 
time.  This  information  can  be  used  to  algorithmically  assign  GM's  to  queues  so  that  each 
GM  operates  at  the  maximum  possible  bandwidth. 

The  current  RC  algorithm  can  introduce  several  different  control  token  sets. 
Establishment  of  criteria  to  select  the  minimal  set  of  dependencies  needs  to  be  developed. 

It  is  also  required  that  the  AN/UYS-2  support  on-line  reconfiguration  of  the  PGM 
graph  for  an  application.  This  reconfiguration  is  typically  performed  by  the  operator  by 
removing  or  adding  one  or  more  branches  of  the  PGM  graphs.  In  RC  assignment,  new 
nodes  could  be  assigned  in  the  empty  slots  so  as  to  leave  the  rest  of  the  assignment 
undisturbed. 
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C.  PROPOSED  RESEARCH 


1.  Hardware  Modifications 

Hardware  modifications  provide  the  long  term  solution  to  ensuring  that  the 
ANAJYS-2  parallel  signal  processor  remains  in  the  Navy  forefront.  The  modular  design 
of  the  AN/UYS-2  makes  hardware  modification  an  attractive  long  term  solution. 

a.  Systolic  Army  Processor 

Systolic  architectures  began  to  be  used  during  the  1980’s  [EVANS  82, 
pp.  343].  Each  PE  corresponds  to  one  binary  tree  level  in  the  systolic  array 
[MOSCOVTTZ  90,  pp.  355-357,  KUNG  85,  pp.  128-131].  Inclusion  of  a  systolic  array 
processor  will  expand  the  functional  capability  and  increase  the  performance  of  the 
AN/UYS-2  by  improving  its  processing  capabilities. 

b.  Open  Architecture 

An  open  architecture  would  ensure  that  current  "industry  technology" 
could  be  inserted  into  the  AN/UYS-2  without  the  extensive  cost  of  modification.  An 
open  architecture- can  be  obtained  by  continuing  to  develop  the  AN/UYS-2’ s  modularity 
while  maintaining  a  standard  software  communication  interface. 

2.  Software  Modifications 

Despite  the  attractiveness  of  hardware  solutions,  the  inexpensive  and  easy  to 
implement  software  modifications  could  yield  the  same  improvement  in  a  shorter  period 


a.  User  Friendly  Processing  Graph  Methodology 

Steps  taken  toward  a  more  user  fhendly  PGM  would  greatly  speed  up  the 
developer's  process.  Although  the  current  PGM  is  usable,  the  application  developer  must 
spend  many  hours  researching  the  primitives  and  establishing  the  basics  of  the  graph  that 
he  wishes  m  use. 

b.  Throughput  Enhancements 

The  throughput  enhancements  discussed  in  this  thesis  can  easily  be 
implemented  in  software  to  achieve  an  improvement  in  the  AN/UYS-2. 

(1)  Node  Chaining 

It  was  shown  that  the  chaining  of  nodes  under  the  FCFS  algorithm 
demonstrates  the  same  level  of  improvement  as  that  of  the  RC  algorithm  without 
chaining. 

(2)  Scheduling 

The  RC  scheduling  algorithm  offers  an  attractive  choice  to  achieve 
the  same  results  as  chaining  without  the  extra  effort  required  by  the  application 
programmers.  Implementation  of  this  algorithm  would  not  only  yield  an  improvement 
on  communication  intensive  graphs,  but  also  on  other  applications.  An  unlimited  or 
selective  loss  of  incoming  data  arriving  at  a  higher  input  rate  than  the  maximum 
achievable  by  the  AN/UYS-2  would  noj  result  in  unreasonable  delay  times. 


(3)  Fault  Tolerance 

Fault  tolerance  refers  to  the  malfunction  of  a  resource  element  which 
prevents  it  from  performing  its  function.  In  the  AN/UYS-2  this  fault  tolerance  is  handled 
by  (he  CPP  which  removes  the  associate  malfunctioning  element  from  the  machine. 
However,  while  this  would  still  work  in  the  existing  approach,  the  RC  technique  would 
lose  a  segment  of  its  cylinder.  It  appears  that  this  loss  of  performance  could  be  kept 
temporary  by  switching  to  another  cylinder  with  a  larger  circumference  on-line. 
However,  this  issue  needs  further  investigation. 
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APPENDIX  A:  REVOLVING  CYLINDER  CODE 


//  Description  ;  The  code  listed  below  follows  the  algorithms  outlined  in  Chapter  m. 
//  :  This  procedure  deals  with  the  initial  assigning  of  nodes  to  the  Cylinder. 

//  :  Nothing  else  need  be  said. 

//  Parameters  :  tempgnodelisting  -  the  node  graph  itself 
//  numaps  -  the  number  of  aps  in  the  AN/UYS-2 

dependencyqs  *topgraph  ::  assignrG(gnode  *1empgnodelisting,int  numaps)  { 
gnode  *temp2gnodelisting  =  tempgnodelisting; 

gnode  ’'tempSgnodelisting  NULL; 

ptrtoptrtoaq  *tempptrtoptr  =  NULL; 
topgraph  *q  *  NULL; 

topgraph  •qtemp  =  NULL; 

topgraph  •q2temp  =  NULL; 

topgraph  •temp  =*  NULL; 

int  count  =  0; 

long  int  circumference  =  0; 

long  int  circum2ference  =  0; 

long  int  maxwidth  »  0; 

long  int  minwidth  =  1000000; 

long  int  widthavg  =  0; 

long  int  j  s  0; 

long  int  t  =  0; 

cyltype  ‘cylinder  =  NULL; 

cyltype  •tempcylinder  =  NULL; 
cylentrytype  ‘cyl2entrylist  =  NULL; 
boolean  iopinnode  -  false; 

boolean  iopoutnode  =  false; 

dependencyqs  ‘headofdepqs  =  NULL; 
while  (temp2gnodelisting  !  =  NULL)  { 
tempptrtoptr  =  tempgnodelisting- >getgnodeinputqslist(temp2gnodelisting-> 
getnodeidO); 
while  (tempptrtoptr  !  =  NULL)  { 
if  (tempptrtoptr- >getnodein0  ==-!){ 
iopinnode  =  true; 
break; 

} 

else  { 

iopinnode  =  false; 
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}; 

tempptrtoptr  =  tempptrtoptr->getnextelementO; 

}; 

tempptrtoptr  =  tempgnodeUsting->getgnodeoutputqslist(temp2gnodelisting-> 
getnodeidO); 
while  (tempptrtoptr  !  =  NULL)  { 

if  (tempptrtoptr- >gemodeoutO  ==-!){ 

iopoutnode  «  true; 
break; 

} 

else  { 

iopoumode  =  false; 

}; 

tempptrtoptr  =  tempptrtoptr- >gemextelementO; 

}; 

if  ((iopinnode)  1 1  (iopoumode))  {  //  do  nothing 

} 

else  { 
count++; 
if(q»=NULL){ 
if  (!(q  =  new  topgraph))  { 
fprintf(stderr,  "Insufficient  memory  for  topgraph\n"); 
exit(l); 

}; 

q->id  =  temp2gnodelisting->gemodeid0; 
q->  width  =  tempgnodeUsting->getprimtime(q->id); 
if  (q->  width  <  minwidth)  { 
minwidth  =  q->  width; 

}; 

if  (q->  width  >  maxwidth)  { 
maxwidth  =  q->  width; 

};  //  note  q-  >est  set  equal  to  zero  by  constructor 
circumference  =  circumference  +  q->  width; 
qtemp  =  q; 

} 

else  { 

if  (!(qtemp->next  =  new  topgraph))  { 
fprintf(stderr,  "Insufficient  memory  for  topgraph\n"); 
exit(l); 

}; 

qtemp- >  next- >  id  =  temp2gnodeUsting->gemodeid(); 

qtemp-  >  next-  >  width  =  tempgnodelisting-  >  getprimtime(qtemp-  >  next-  >  id); 

if  (qtemp-  >  next-  >  width  <  minwidth)  { 
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minwidth  =  qtemp- >  next- >  width; 

}; 

if  (qtemp- >  next- >  width  >  maxwidth)  { 
maxwidth  =  qtemp- >  next- >  width; 

}; 

circumference  =  circumference  +  qtemp- >  next- >  width; 
qtemp  =  qtemp- >  next; 

}; 

}; 

temp2gnodelisting  =  temp2gnodelisting->gemextgnodeO; 

}; 

widthavg  »  minwidth; 
circumference  =  circumference  /  numaps; 
circum2ference  =  circumference; 
while  (maxwidth  >  circum2ference)  { 
circum2ference  =  circum2ference  +  widthavg; 

}; 

circumference  =  circum2ference; 
for  (int  i=*l;i<  =numaps;i++)  { 
if  (cylindCT  *  =  NULL)  { 
if  (!  (cylinder  =  new  cyltype))  { 
fprintf(stderr, "Insufficient  memory  for  cyltype\n"); 
cxit(l); 

}; 

j  =*  0; 

while  (j  <  circumference)  { 
if  (cylinder- >cylentrylist  ==  NULL)  ( 
if  (’.(cylinder- >cylentrylist  =  new  cylentrytype))  { 
fprintf(stderr, "Insufficient  memory  for  cylentrytype\n"); 
exit(l); 

}; 

.  cylinder- >cyIentrylist->  widthstarttime  =  j; 
cyl2entrylist  =  cylinder- >cylentrylist; 

} 

else  { 

if  (!(cyl2entrylist->nextcylentry  =  new  cylentrytype))  { 
fprintf(stderT, "Insufficient  memory  for  cylentrytype\n"); 
exit(l); 

}; 

cyl2entrylist-  >  nextcylentry-  >  widthstarttime  =  j ;  . 
cyl2entrylist  =  cyl2entrylist- >  nextcylentry; 

}; 

j  3=  j  +  widthavg; 
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}; 

tempcylinder  =  cylinder; 

} 

else  { 

if  (!  (tempcylinder- >nextcylap  =  new  cyltype))  { 
fi;irintf(stderT,"Insufficient  memory  for  cyltype\n"); 
exit(l); 

}; 

j  =  0; 

while  (j  <  circumference)  { 

if  (tempcylinder- >nextcylap->cylentrylist  ==  NULL)  { 

if  (!(tempcylinder->next(7lap->cylentrylist  =  new  cylentrytype)){ 
fijrintf(stderr, "Insufficient  memory  for  cylentrytype\n"); 
exit(l); 

}; 

tempcylinder-  >  nextcylap-  >  cylentrylist-  >  widthstarttme  =  j ; 
cyl2entrylist  =  tempcylinder- >  nextcylap- >  cylentrylist; 

} 

else  { 

if  (!(cyl2entrylist->nextcylentry  =  new  cylentrytype))  { 
fi!rintf(stderr, "Insufficient  memory  for  cylentrytype\n"); 
exit(l); 

}; 

cylentrylist-  >  nextcylentry-  >  widthstarttime  =  j ; 
cyl2entrylist  =  cyl2entrylist- >  nextcylentry; 

}; 

j  =  j  +  widthavg; 

}; 

tempcylinder  =  tempcylinder- >  nextcylap; 

}; 

}; 


qtemp  =  q; 

while  (qtemp  !  =  NULL)  { 
temp  =*  qtemp; 
qtemp  =  qtemp- >  next; 

if  (cylinder- >checkfreecylspace(cylinder)  <  (temp->width/widthavg))  { 
circumference  =  circumference  +  circumference; 

cylinder  =  cylinder- >increasecylsize(cylinder, circumference,  widthavg); 

qtemp  =  q; 
temp  =  qtemp; 
qtemp  =  qtemp- >  next; 

... 

t  =  schedulenode(temp,temp-  >est,circumference,widthavg,numaps,cylinder); 
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tempptrtoptr  =  tempgnodelisting->getgnodeoutputqsiist(temp->id); 
rscestiipdt(teinpptrtoptr,qtemp,t+temp-  >  width); 

}; 

headofdepqs  =  cylinder- >cr^tedeps(cylinder,ciicumference,widthavg); 
return  headofdepqs; 

}; 

//  Description  :  The  following  procedure  deals  with  scheduling  a  node  in  a  slot  in  the 
cylinder. 

//  Parameters  :  temp  -  the  graph  node  to  schedule 

//  t  -  the  time  to  attempt  to  schedule  at 

//  circum  -  the  circumference  of  the  cylinder 

//  widthavg  -  the  width  of  a  slot  in  the  cylinder 

//  nunuqw  -  the  number  of  aps  in  the  AN/UYS-2 

//  cyl  -  the  cylinder  itself 

long  int  topgr2q)h  ::  schedulenode(topgraph  *temp,long  int  t,long  int  circum, 

long  int  widthavg.int  numaps,cyltype  '*'cyl)  { 
cyltype  *tempcylinder  =  cyl; 

cylcntrytype  *tempcyl,*temp2cyl; 
boolean  scheduled  »  f^se; 

boolean  available  =>  false; 

long  int  insertime  =»  0; 

int  index  »  0; 

int  blockcount  -  0; 

long  int  oldinsertime  =  0; 

insertime  =  t; 
oldinsertime  »  insertime; 
while  (scheduled  = »  false)  { 
insertime  =  insertime  %  circum; 
if  (insertime  <  oldinsertime)  { 
index—; 

}; 

tempcylinder  =  cyl; 
for  (int  i=l;i<  =numaps;i+-f')  { 
tempcyl  =  tempcylinder- >cylentrylist; 

while  ((insertime  >  tempcyl-  >  widthstarttime)  &&  (tempcyl  !  =  NULL))  { 
tempcyl  =  tempcyl- >  nextcylentry; 

}; 

if  ((temp-  >  width  %  widthavg)  =  =  0)  { 
blockcount  =  temp- >  width  /  widthavg; 

} 

else  { 

blockcount  =  temp-  >  width  /  widthavg  +  1 ; 

}; 


74 


temp2cyl  *  tempcyl; 
for  (int  j  =  1  y  <  ^blockcount'J  +  +)  { 
if  (temp2cyi  =  =  NULL)  { 

//  at  end  of  current  circum  so  lets  check  back  at  beginning 
temp2cyl  =  tempcylinder->cylentrylist; 

}; 

if  (tenip2cyl->nodesch  ==  0)  { 
available  «  true; 

} 

else  { 

available  =  false; 
break; 

}; 

temp2cyl  =  temp2cyl->nextcylentry; 

}; 

if  (available)  { 
temp2cyl  *  tempcyl; 
for  (int  j = 1  y  <  “blockcounty + +)  { 
if  (temp2cyl  =  =  NULL)  { 

//  at  end  of  current  circum 
temp2cyl  =  tempcylinder->cylentrylist; 

}; 

temp2cyl->nodesch  =*  temp- >  id; 
temp2cyl->nodeicount  *  index; 
temp2cyl  =  temp2cyl->nextcyientry; 

}; 

scheduled  -  true; 
break; 

} 

else  { 

tempcylinder  =  tempcylinder->nextcylap; 

}; 

}; 

if  (scheduled  =  =  false)  { 
oldinserdme  =  insertime; 
insertime  =  (insertime  +  widthavg)  %  circum; 

}; 

}; 

return  insertime; 

}; 

//  Description  :  Checks  the  free  space  remaining  in  the  cylinder  by  checking 
//  to  see  if  a  node  other  than  zero  has  been  scheduled  into  that 

//  cylinder  slot. 
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//  Parameters  :  tempcylinder  -  The  cylinder 
long  int  cyltype  ::  checkfreecylspace(cyltype  ’"tempcylinder)  { 
cylentrytype  *tempcylentry; 
long  int  maxfree  =  0; 

long  int  currfiree  =  0; 

long  int  startfree  =  0; 

boolean  atstartofapcyl  =  true; 

while  (tempcylinder  !  =  NULL)  { 
tempcylentry  =  tempcylinder- >cylentrylist; 
atstartofapcyl  =  true; 
startfree  =  0; 
currfree  =  0; 

while  (tempcylentry  !  =  NULL)  { 
if  ((tempcylentry- >nodesch  ==  0)  &&  (atstartofapcyl))  { 
start&ee-»-4-; 
if  (startfree  >  maxfree)  { 
maxfree  =  startfree; 

}; 

} 

else  { 

atstartofapcyl  =  false; 
if,  (tempcylentry-  >  nodesch  =  =  0)  { 
currfree++; 
if  (currfree  >  maxfree)  { 
maxfree  =  currfree; 

}; 

} 

else  { 

currfree  =  0; 

}; 

if  ((tempcylentry-  >  nextcylentry  =  =  NULL)  &&  (currffee  >  0))  { 
if  ((currfree  +  startfree)  >  maxfree)  { 
maxfre»*  =  currfree  -I-  startfree; 

}; 

}; 

}; 

tempcylentry  =  tempcylentry- >  nextcylentry; 

}; 

tempcylinder  =  tempcylinder- >  nextcylap; 

}; 

return  maxfree; 

}; 

//  Description  :  Increments  the  size  of  the  cylinder  by  circum  and  resets  its 
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//  slots  to  reflect  no  nodes  scheduled. 

//  Parameters  :  tempcylinder  -  The  cylinder  itself 
//  citcum  -  The  size  to  increase  the  cylinder  to 

//  minwidthavg  -  The  cylinder  slot  size 

cyltype  *cyltype  ::  increasecylsize(cyltype  *tempcylinder,long  int  circum, 

long  int  minwidthavg)  { 
cyltype  ’*temp2cylinder  =  tempcylinder, 

cylentrytype  *tempcylentry; 
long  int  blocl^t  =  0; 

while  (temp2cylinder  !  =  NULL)  { 
blockest  s  0; 

tempcylentry  =  temp2cylinder->cylentrylist; 
while  (tempcylentry- >nextcylentry  !=  NULL)  { 
tempcylentry- >nodesch  =  0; 
blockut  »  blockest  +  minwidthavg; 
tempcylentry  =  tempcylentry- >nextcylentry; 

}; 

blockest  s  blockest  -f-  minwidthavg; 
tempcylentry- >  nodesch  =  0; 
while  (blockest  <  circum)  { 
tempcylentry- >nextcylentry  =  new  cylentrytype; 
tempcylentry- >nextcylentry->  widthstarttime  =  blockest; 
tempcylentry  =»  tempcylentry- >nextcylentry; 
blockest  «  blockest  +  minwidthavg; 

}; 

temp2cylinder  =  temp2cylinder- >  nextcylap; 

}; 

return  tempcylinder; 

}; 

//  Description  :  Returns  true  if  the  searched  node  is  an  ancestor  of  the 
//  queue  pointed  to  by  temp2ptrtoptr  output  node. 

//  Parameters  :  temp2ptrtoptr  -  ptrtoptr  to  a  queue  to  check  nodes 
boolean  cyltype  ::  ancestor(ptrtoptrtoaq  *temp2ptrtoptr,int  citprime)  { 
boolean  questanc  =  false; 
ptrtoptrtoaq  ^temp3ptrtoptr; 
while  (temp2ptrtoptr  !  =  NULL)  { 
if  (questanc  =  =  true)  { 
br^; 

}; 

if  (temp2ptrtoptr- >  getnodeoutQ  ==  -1)  { 

//  at  end  of  chain  so  do  nothing,  this  should  work  here  since 
//  here  we  are  dealing  with  the  graph  itself  and  not  the  cylinder 

} 


else  { 

if  (citprime  ==  temp2ptrtoptr->getnodeout0)  { 
questanc  =  true; 
break; 

}; 

tempSptrtoptr  =  gnodelisting->getgnodeoutputqslist(temp2ptrtoptr->getnodeout0); 
questanc  =  ancestor(teinp3ptrtoptr,citprime); 

}; 

temp2ptrtoptr  =  temp2ptrtoptr->getnextelenient0; 

}; 

return  questanc; 

}; 

//  Description  :  The  following  procedure  creates  the  dependencies  among  nodes  in  the 
cylinder. 

//  Paramaters  :  cylinder-  the  cylinder 
//  :  circum  -  the  circumference  of  the  cylinder 

//  :  widthavg  -  the  width  of  a  slot  in  the  cylinder 

dependencyqs  *cyltype  createdeps(cyltype  *cylinder,long  int  circum, 
long  int  widthavg)  { 
ptrtoptrtoaq  ’*temp2ptrtoptr, 

■•tempSptrtoptr; 
cyltype  '•tempcylinder; 
cylentrytype  *tempcylentry, 

•tempprimecylentry; 
dependencylist  *tempptrtodepptr; 

dependencyqs  •headdepq  =  NULL; 

dependencyqs  *^mphe^depq; 


long  int 

t. 

tprime. 

maxentrytotry, 

entrycount; 

boolean 

cicircumvalid  =  false: 

boolean 

alreadydep  =  false; 

boolean 

notable  =  true; 

gnode 

*tempgnodelisting; 

tempcylinder  =  cylinder; 
while  (tempcylinder  !  =  NULL)  { 
tempcylentry  =  tempcylinder- >cylentrylist; 
tempprimecylentry  =  tempcylinder- >cylentrylist; 
t  =  0; 

while  (t  <  circum)  { 

tempprimecylentry  =  tempcylinder- >cylentrylist; 
tprime  =  t  -I-  widthavg; 
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while  (tprime  <  circum)  { 
while  (tempcylentry->  widthstarttime  !=  t)  { 
tempcylentry  =  tempcylentry->nextcylentry; 

}; 

while  (tempprimecylentry->  widthstarttime  !=  tprime)  { 
tempprimecylentry  =  tempprimecylentry- >  nextcylentry; 

}; 

if  ((tempcylentry- >nodesch  >=  tempprimecylentry- >nodesch)  1 1 
(tempcylentry- >nodesch  ==  0))  { 

//  do  nothing  same  node  in  next  block 

//  of  cylinder  or  not  currently  a  node  scheduled  in  this  block 

} 

else  { 

tempheaddepq  =  headdepq; 

while  ((tempheaddepq  !=  NULL)  &&  (tempheaddepq- >nodefrom  !  = 
tempcylentry- >nodesch)  &&  (tempheaddepq- >nodeto  !  = 
tempprimecylentry- >nodesch))  { 
tempheaddepq  =  tempheaddepq- >nextdepq; 

}; 

if  (tempheaddepq  =  =  NULL)  { //  not  already  a  dependency  for  these 
temp2ptrtoptr  =  gnodelisting->getgnodeoutputqslist(tempcylentry-> 
nodesch); 

if  ((tempcylentry- >nodeicount  ==  tempprimecylentry- >nodeicount)  && 
(ancestor(temp2ptrtoptr,tempprimecylentry->  nodesch))  ==*  false)  { 
if  (tempcylentry-  >  ftomdepqs  =  =  NULL)  { 
if  (!(tempcylentry->fromdepqs  =  new  dependencylist))  { 

^)tintf(stderr, "Insufficient  memory  for  dependlst\n"); 
exit(l); 

}; 

if  (!(tempcylentry->fTOmdepqs->ptrtodepq  =  new  dependencyqs))  { 
fprintf(stdeiT, "Insufficient  memory  for  dependqs\n"); 
exit(l); 

}; 

tempcylentry- >fromdepqs->ptrtodepq->nodefTom  = 
tempcylentry-  >  nodesch; 

tempcylentry-  >  f^mdepqs-  >  ptrtodepq-  >  nodeto  = 
tempprimecylentry-  >  nodesch; 

if  (headdepq  =  =  NULL)  { 
headdepq  =  tempcylentry- >  fromdepqs- >  ptrtodepq; 

else  { 

tempheaddepq  =  headdepq; 
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while  (tempheaddepq-  >  nextdepq  !  =  NULL)  { 
tempheaddepq  =  tempheaddepq- >  nextdepq; 

}; 

tempheaddepq- >  nextdepq  =  tempcylentry- >  fromdepqs- > 
ptrtodepq; 

}; 


} 

else  { 

tempptrtodepptr  =  tempcylentry- >  fromdepqs; 
alr^ydep  -  false; 

while  ((tempptrtodepptr- >ptrtonextptrtodepq  !=  NULL)  && 
(alreadydep  =  =  false))  { 
if  (tempptrtodeppu*-  >  ptrtodepq-  >  nodeto  =  = 
tempprimecylentry->nodesch)  { 

//  a  dependency  alre^y  exists  for  this  pair  so  do  NOT 
//  create  another  one. 
alreadydep  =  true; 

} 

else  { 

tempptrtodepptr  =  tempptrtodepptr- >ptrtonextptrtodepq; 

}; 

}; 

if  (alreadydep)  { 

//  do  nothing  since  a  dependency  already  exists  for  this 

} 

else  { 

if  ('(tempptrtodepptr- >ptrtonextptrtodepq  =new  dependencylist))  { 
fprintf(stderr, "Insufficient  memory  for  deplst\n"); 
exit(l); 

}; 

if  (!(tempptrtodepptr->ptrtonextptrtodepq->  ptrtodepq  = 
new  dependencyqs))  { 

fprintf(stderr," Insufficient  memory  for  depqs\n"); 
exit(l); 

}; 

tempptrtodepptr-  >  ptrtonextptrtodepq-  >  ptrtodepq-  >  nodefrom 
=  tempcylentry- >  nodesch; 

tempptrtodepptr-  >  ptrtonextptrtodepq-  >  ptrtodepq-  >  nodeto  = 
tempprimecylentry-  >  nodesch; 

if  (headdepq  =  =  NULL)  { 
headdepq  =  tempptrtodepptr- >  ptrtodepq; 
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} 

else  { 

tempheaddepq  =  headdepq; 
while  (tempheaddepq- >nextdepq  !=  NULL)  { 
tempheaddepq  =  tempheaddepq- >nextdepq; 

}; 

tempheaddepq- >nextdepq  =  tempptrtodepptr- >  ptitodepq; 

}; 

}; 

}; 

if  (alicadydep  =»  =  false)  { 
if  (tempprimecylentry->todepqs  ==  NULL)  { 
if  (!(tempprimecylentry->todepqs  =  new  dependencylist))  { 
^prmtf(stderr,  "Insufficient  memory  for  deplst\n"); 
exit(l); 

}; 

tempprimecylentry->todepqs->ptrtodepq  =  tempheaddepq- > 
nextdepq; 

} 

else  { 

tempptrtodepptr  =»  tempprimecylentry->todepqs; 
while  (tempptrtodepptr- >ptrtonextptrtodepq  I  -  NULL)  { 
tempptrtodepptr  =  tempptrtodepptr- >ptitonextptrtodepq; 

}; 

if  (!(tempptrtodepptr->ptrtonextptrtodepq  = 
new  dependencylist))  { 

fprintf(stderr, "Insufficient  memory  for  deplst\n"); 
exit(l); 

}; 

tempptrtodepptr- >ptrtonextptrtodepq->ptrtodepq  = 
tempheaddepq-  >  nextdepq; 

}; 

}; 

}; 

}; 

}; 

tprime  =  tprime  +  widthavg; 

}; 

t  =  t  +  widthavg; 

}; 

tempcylinder  =  tempcylinder->nextcylap; 

}; 

//  add  a  dependency  from  end  to  all  input  nodes 
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tempcylinder  =  cylinder, 
maxoitrytotry  »  drcum  /  widthavg; 
while  (ciciicumvalid  » »  false)  { 
tempcyl«itry  =  tempcylinder- >cylentrylist; 
entrycount  =?  0; 

while  ((tempcylentry-  >  nextcylentry !  =  NULL)  &&  (entrycount  <  maxentrytotry)) 

{ 

entrycount++; 

tempcylentry  =  tempcylentry- >  nextcylentry; 

}; 

if  ((tempcylentry- >nodesch  !=  0)  &&  (tempcylentry- >  nodeicount  ==  0))  { 
cicircumvalid  »  true; 

} 

else  { 

tempcylinder  »  tempcylinder- >nextcylap; 
if  (tempcylinder  =  =  NULL)  { 
maxentrytotry  =  maxentrytotry  -  1; 
tempcylinder  »  cylinder; 

}; 

}; 

}; 

tempgnodelisting  =  gnodelisdng; 
while  (tempgnodelisting  ! »  NULL)  { 

temp2ptrtoptr  =  tempgnodelisting- >getgnodemputqslist(tempgnodelisting-> 
getnodeidO); 
while  (temp2ptrtoptr  !  =  NULL)  { 
if  (temp2ptrtoptr->gemodein0  =*  *1)  { 

//iop  node  so  get  the  next  node  following  the  iopnode 
temp3ptrtoptr  s  tempgnodelisting- >getgnodeoutputqslist(tempgnodelisting-> 
gemodeidO); 

while  (temp3ptrtoptr  !  =  NULL)  { 

//  set  dependencies  to  all  these  nodes  from  the  end  of  the  cylinder 
if  (tempcylentry-  >  fromdepqs  =  =  NULL)  { 
if  (!(tempcylentry->  fromdepqs  =  new  dependencylist))  { 
fprintf(stderr,  "Insufficient  memory  for  dependlist\n"); 
exit(l); 

}; 

if  (!  (tempcylentry- >ftx)mdepqs->ptrtodepq  =  new  dependencyqs))  { 
fprintf(stderr,  "Insufficient  memory  for  dependencyqs\n"); 
exit(l); 

}; 

tempcylentry- >  fromdepqs- >ptrtodepq->nodefrom  =  tempcylentry- > 
nodesch; 
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tempcylentry->fromdepqs->ptrtodepq->nodeto  =  temp3ptrtoptr- > 
getnodeoutO; 

tempcylentry->fromdepqs->ptrtodepq->dq)tokensize  =  1; 
if  (headdepq  =  =  NULL)  { 
headdepq  «  tempcylentry->froindqxis->ptrtodepq; 

} 

else  { 

tempheaddepq  -  headdepq; 
while  (tempheaddepq- >nextdepq  !=  NULL)  { 
tempheaddepq  »  tempheaddepq- >nextdepq; 

}; 

tempheaddepq- >  nextdepq  =  tempcylentry- >  fromdepqs- >  ptrtodepq; 

}; 

} 

else  { 

tempptrtodepptr  =  tempcyientry->  fromdepqs; 
alie^ydep  »  false; 

while  ((tempptrtodepptr- >ptrtooextpf^xiepq  !=  NULL)  && 

(alreadydep  =**  fialse))  { 

if  (tempptrtodepptr- >  ptrtodepq- >nodeto  ==  tempSptrtoptr- > 
getnodeoutO){ 

//  a  dependency  already  exists  for  this  pair  so  do  NOT 
//  create  another  one. 
alreadydep  =  true; 

} 

else  { 

tempptrtodepptr  =  tempptrtodepptr- >ptrtonextptrtodepq; 

}; 

}; 

if  (alreadydep)  ( 

//do  nothing  since  a  dependency  already  exists  for  this 

} 

else  { 

if  (!(tempptrtodepptr->ptrtonextptrtodepq  ==  new  dependencylist))  { 
fprintf(stderr, "Insufficient  memory  for  deplist\n"); 
exit(l); 

}; 

if  (.’(tempptrtodepptr- >ptnonextptrtodepq->  ptrtodepq  =  new 
dependencyqs)){ 

fprintf(stderr, "Insufficient  memory  for  depqsXn"); 
exit(l); 

}; 

tempptrtodepptr-  >  ptrtonextptrtodepq-  >  ptrtodepq-  >  nodefrom  = 
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tempcylentry-  >  nodesch; 

tempptrtcxlepptr->ptrtonextptrtodqxi->ptrtodepq->nodeto  = 
temp3ptrtoptr-  >  getnodeoutO; 

tempptrtodq)ptr-  >  ptrtonextptrtodepq-  >  ptrtodepq-  >  deptokensize 
if  (headdepq  =  =  NULL)  { 

headdepq  =  tempptrtodepptr->  ptrtonextptrtodepq- >  ptrtodepq; 

} 

else  { 

tempheaddepq  «  headdepq; 
while  (tempheaddepq- >nextdepq  !=  NULL)  { 
tempheaddepq  =  tempheaddepq- >nextdepq; 

}; 

tempheaddepq- >nextdepq  =  tempptrtodepptr- > 
ptrtonextptrtodepq-  >  ptrtodepq; 

}; 

}; 

}; 

tempSptrtoptr  »  temp3ptrtoptr->getnextelement0; 

}; 

}; 

temp2ptrtoptr  =  temp2ptrtoptr->gemextelement0; 

}; 

tempgnodelisting  =  tempgnodelisting->getnextgnodeO; 

}; 

tempheaddepq  »  headdepq; 

printf("\nThe  following  dependencies  have  been  assigned:\n’’); 
while  (tempheaddepq  !  -  NULL)  { 
printf("From;  "); 

printf("  %d"  .tempheaddepq-  >  nodefrom); 
prmtf("  To:  "); 

printf("  %d"  .tempheaddepq- >  nodeto); 
printf("\n"); 

tempheaddepq  =  tempheaddepq- >  nextdepq; 

}; 

return  headdepq; 

}; 

//  Description  :  Increments  or  decrements  the  tokens  as  required  based  on  the 
//  call,  depending  on  whether  or  not  it  is  at  the  head  or  tail 

//  of  the  dependency. 

//  Parameters  :  tempdeplist  -  the  list  of  dependencies 
//  idnode  -  the  id  number  of  the  node  to  adjust 

void  dependencyqs  ::  adjusttokens(dependencyqs  *tempdeplist,int  idnode)  { 
while  (tempdeplist !  =  NULL)  { 
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if  (tempdeplist-  >  nodefrom  =  =  idnode)  { 

(tempdeplist-  >  deptokensize) + + ; 

} 

else  { 

if  (tempdeplist- >nodeto  ==  idnode)  { 

(tempdeplist-  >  deptokensize)—; 

}; 

}; 

tempdeplist  =  tempdeplist- >nextdepq; 

}; 

}; 

//  Description  :  Checks  to  see  if  the  token  condition  of  greater  than  zero  is 
//  satisfied. 

//  Parameters  :  tempdeplist  •  the  list  of  dependencies 
//  idnode  -  the  id  number  of  the  node  to  adjust 

boolean  dependencyqs  ::  checktokBns(dependa)cyqs  *tempdeplist,int  idnode)  { 
boolean  oktoexec  »  true; 
while  (tempdeplist !  =  NULL)  { 
if  (tempdeplist- >nodeto  =-  idnode)  { 
if  (tempdeplist- >  deptokensize  >  0)  { 
oktoexec  =  true; 

} 

else  { 

oktoexec  -  false; 
break; 

}; 

}; 

tempdeplist  -  tempdeplist- >nextdepq; 

}; 

return  oktoexec; 

}; 

//  Description  :  For  each  daughter  of  the  last  node  inserted  into  the  cylinder 
//  update  the  earliest  start  time.  Recursively  calls  daughters 

//  of  daughters  until  the  end  of  the  graph  is  reached. 

//  Parameters  :  temp2ptrtoptr  -  ptr  to  ptr  to  a  queue  to  adjust 
//  q2temp  -  the  graph  with  the  rest  of  the  nodes 

//  tpluswidth  -  the  time  to  adjust  est  to 

void  topgraph  recestupdt(ptrtoptrtoaq  '*temp2ptrtoptr,topgraph  ’•q2temp, 
long  int  tpluswidth)  { 
ptrtoptrtoaq  *temp3ptrtoptr; 
topgraph  *q3temp; 

while  (temp2ptrtoptr  !  =  NULL)  { 
if  (temp2ptrtoptr- >  getnodeoutO  ==  -1)  { 
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cerr  <  <  "\nERROR  DURING  RECESTUPDT,  OUTPUTNODE  IN 
CYLINDER\n"; 

} 

else  { 

q3temp  =  q2temp; 

while  ((qStemp- >  id  !=  temp2pmoptr->getnodeout0)  && 

(q3temp-  >  next !  =  NULL)  &&  (q3temp  !  =  NULL)){ 
q3temp  =  q3teinp->next; 

}; 

if  (q3temp->id  ==  temp2ptrtoptr->getnodeout0)  { 
if  (tpluswidth  >  q3temp->est)  { 
q3temp->est  =  tpluswidth; 

}: 

temp3ptrtoptr  =  gnodelisting->getgnodeouq)utqslist(q3temp->id); 
recestupdt(temp3ptrtoptr,q2temp,tpluswidth); 

};  //else  q3temp-  >  next  =  =  NULi  so  at  the  end  of  this  chain 

}; 

temp2ptrtoptr  =  tenip2ptrtoptr->getnextelenient0; 

}; 

}; 
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APPENDIX  B:  PGM  REPRESENTATION  CODE 


//  Description  :  The  following  procedure  documents  the  order  of  loading  the  queue  data 
//  :  into  the  cylinder  and  is  included  here  for  that  reason, 

void  gqueue  ::  loadqueuesQ  { 
int  numqueues  »  0; 
gqueue  *tempgqueuelisting  -  NULL; 
cin  >  >  numqueues; 

for  (int  queueloop=l;queueloop<  =numqueues;queueloop++)  { 
if  (gqueuelisting  =  =  NULL)  {  . 

if  (!(gqueuelisting  =  new  gqueue))  { 
fprintf(stderr, "Insufficient  memory  for  gqueue\n"); 
exit(l); 

}; 

cm  >  >  gqueuelisting- >gqueueid; 

//  NOTE  GMID  IS  ASSIGNED  BY  LOADGRAPH  ON  ADJUSTMENT  OF  PTRS 

cin  >>  gqueuelisting- >  nodein; 

cin  >>  gqueuelisting- >nodeout; 

cin  >  >  gqueuelisting- >datarate; 

cin  >  >  gqueuelisting- >  overthreshold; 

cin  >  >  gqueuelisting- >productionqty; 

cin  >  >  gqueuelisting- >overc^)acity; 

tempgqueuelisting  =  gqueuelisting; 

} 

else  { 

if  (!(tempgqueuelisting->nexteiement  =  new  gqueue))  { 
fprintf(stderr," Insufficient  memory  for  gqueue\n"); 
exit(.l); 

}; 

cin  >  >  tempgqueuelisting->nextelement->gqueueid; 
cin  >  >  tempgqueuelisting-  >  nextelement-  >  nodein; 
cin  >  >  tempgqueuelisting- >  nextelement- >nodeout; 
cin  >  >  tempgqueuelisting- >  nextelement- >datarate; 
cin  >  >  tempgqueuelisting- >  nextelement- >  overthreshold; 
cin  >  >  tempgqueuelisting- >  nextelement- >productionqty; 
cin  >  >  tempgqueuelisting- >  nextelement- >  overcapacity; 
tempgqueuelisting  =  tempgqueuelisting- >  nextelement; 

}; 

}; 

}; 
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//  Description  :  The  following  procedure  documents  the  loading  of  the  graph  ’jito  the 
simulator 

//  ;  and  is  included  here  for  that  reason, 

long  int  gnode  ::  loadgraph(int  numiops,int  numgms,int  numaps, 
ioprocessors  ^tempioplist)  { 
int  numnodes  =:  0; 

gnode  ’^tempgnodelisting  =  NULL; 
int  numgmtoassign  =  0; 

int  isiopnode  =  numaps; 

int  numiopnodes  =  0; 

long  int  sumexectimes  0; 
cin  >  >  numnodes; 

for  (int  nodeloop=l;nodeloop<  =numnodes;nodeloop+ +)  { 
if  (gnodelisting  *  =  NULL)  { 
if  (!  (gnodelisting  =  new  gnode))  { 

^rintf(stderr,  "Insufficient  memory  for  gnode\n"); 
exit(l); 

}; 

cin  >  >  gnodelisting’ >nodeid; 
cin  >  >  isiopnode; 
if  (isiopnode  !  =  0)  { 

gnodelisting’ >  iopidassigned  =  ((nodeloop+ 1)  %  numiops)+l; 
tempioplist’  >  assignnodetoiop(tempioplist, gnodelisting’  >  nodeicf, 
gnodelisting’  >  iopidassigned); 
numiopnodes++; 

}; 

cin  >  >  gnodelisting’ >aissize; 

cin  >  >  gnodelisting’ >  primtime; 

sumexectimes  =  sumexectimes  +  gnodelisting’ >  primtime; 

//  Now  update  queue  pointers 
numgmtoassign  =  ((nodeloop+ 1)  %  numgms)+l; 
gnodelisting’ >ptrtoinqlist  =  gnodelisting- >ptrtoinqlist->establishinptrs( 
gn^elisting-  >  ptrtoinqlist,  numgmtoassign); 
gnodelisting-  >  ptrtooutqlist  =  gnodelisting-  >  ptrtooutqlist-  >  establishoutptrs( 
gnodelisting-  >  ptrtooutqlist); 
tempgnodelisting  =  gnodelisting; 

} 

else  ( 

if  (!(tempgnodelisting->nextgnode  =  new  gnode))  { 
fprintf(stderr,  "Insufficient  memory  for  gnode\n"); 
exit(l); 

}; 

cin  >  >  tempgnodelisting- >nextgnode->nodeid; 
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cin  >  >  isiopnode; 
if  (isiopnode  !  =  0)  { 

tempgnodelisting- >  nextgnode- >  iopidassigned  =  ((nodeloop+1)  % 

nuimops)+l; 

tempioplist-  >  assignnodetoiop<tempioplist,tempgnodelisting-  >  nextgnode-  >  nodeid, 

tempgnodelisting-  >  nextgnode-  >  iopidassigned); 
numiopnodes-l-+; 

}; 

cin  >>  tempgnodelisting- >  nextgnode- >aissize; 

cin  >  >  tempgnodelisting- >  nextgnode- >pnmtime; 

sumexectimes  =  sumexectimes  +  tempgnodelisting- >  nextgnode- >  primtime; 

//  Now  load  queue  data  for  queues  associated  with  this  node 
numgmtoassign  =  ((nodeloop-H  1)  %  numgms)-l-l; 

tempgnodelisting-  >  nex^ode-  >  ptrtoinqlist  =  tempgnodelisting-  >  nextgnode-  > 
ptrtoinqlist-  >  establishinptrs( 

tempgnodelisting-  >  nextgnode-  >  ptrtoinqlist-  > 
numgmtoassign); 

tempgnodelisting- >  nextgnode- >ptrtooutqlist  -  tempgnodelisting- >  nextgnode 

-  >  ptrtooutqlist-  >  establishoutptrs( 
tempgnodelisting-  >  nextgnode-  >  ptrtooutqlist) ; 
tempgnodelisting  =»  tempgnodelisting- >  nextgnode; 

}; 

); 

return  (sumexectimes); 

}; 
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APPENDKC:  MAIN  SIMULATOR  CODE 


//  The  following  constitutes  the  main  program  of  the  simulator 
//  Processiop  is  called  first  to  process  the  input/output  processors. 

//  Processbus  is  then  called  for  cbus  and  dtn  to  process  them. 

//  Processgm  is  then  called  to  process  the  global  memories. 

//  Processsch  is  then  called  to  process  the  scheduler-  > 

//  Processap  is  then  called  to  process  aps. 
mainO  { 

//  The  variables 
schprocessor  *scheduler; 
ioprocessors  *ioplist  =  NULL; 
gmprocessors  *gmlist  =  NULL; 
approcessors  *aplist  =  NULL; 
boolean  breakdownpriority  =  false; 
boolean  invalidrun  =  true; 
int  numberiops  -  0; 

int  numbergms  »  0; 

int  numbcraps  *  0; 

int  instancestart  =  0; 

int  instancefinish  =  0; 

int  runcase  =  0; 

long  int  simtime  » 0; 

dependencyqs  •headofdepUst  =  NULL; 
topgraph  ’•dummy  =  NULL; 
gnodeUsting  =  NULL; 
gqueuelisting  =  NULL; 

cout  <  <  "This  simulator  expects  the  graph  data  to  be  in  a  file  ’graph’  and"; 
coii.  <  <  "  the  EMSP  Structure  to  be  in  IOP,GM,AP  Order  in  a  file  ’emspstru’\n” 
cout  <  <  "\nEnter  the  starting  node  instance  for  examination  "; 
cin  >  >  instancestart; 

cout  <  <  "\nEnter  the  completing  node  instance  for  examination  "; 
cin  >  >  instancefinish; 

cout  <  <  "\n(l)  Enter  1  for  FCFS  scheduled  run."; 
cout  <  <  "\n(2)  Enter  2  for  RC  scheduled  run.\n"; 
cin  >  >  runcase; 

if  ((runcase  <  1)  1 1  (runcase  >  2))  { 
cout  <  <  "\nlnvalid  run  entered  assuming  FCFS  run\n"; 
runcase  =  1; 

}; 
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//  establish  the  Simulator  structure 
if  (freopen("emspstru'*,"r",stdin)  ==  NULL)  ( 
fprintfCstderr,  "error  redirecting  stdin\n"); 

} 

else  { 

cin  >  >  numberiops; 

ioplist  =  ioplist->establishiops(numberiops,ioplist); 
cin  >  >  numbergms; 

gmlist  gmlist->establishgms(numbergms,gmlist); 
cin  >  >  numberaps; 

aplist  s  aplist->establishaps(numberaps,aplist); 
fclose(stdin); 

}; 

if  (!  (scheduler  =  new  schprocessor))  { 
fprintf(stderr, "Insufficient  memory  for  schprocessor\n"); 

«it(l); 

}; 

scheduler-  >  setobjectid(l, numberaps); 

//  Establish  the  graph  structure  by  reading  in  the  data  required 
if  (freopen("graph","r",stdin)  =*  NULL)  { 
fprintf(stderr,  "error  redirecting  stdinXn"); 

} 

else  { 

gqueuelisting-  >  loadqueuesQ; 

simtime  =  gnodelisting->  lor  dgraph(numberiops,numbcrgms,numbeTaps, ioplist); 
fclose(stdin); 

}; 

simtime  =  simtime  •  instancefinish; 

cout  <  <  "\nThe  expected  simulation  time  in  order  to  ensure  last  node  " ; 
cout  <  <  "\ninstance  completion  is:  "; 
cout  <  <  simtime; 

Cout  <  <  "  microseconds.  \n"; 
if  (freopen("results","w",stdout)  ==  NULL)  ( 
fprintf(stderr,  "error  redirecting  stdout\n"); 

} 

else  { 

printf("The  Simulation  Time  is:  "); 
printf("  %ld"  .simtime); 
printf("\n"); 
if  (runcase  -=  2)  { 

headofdeplist  =  dummy- >assignrc(gnodelisting, numberaps); 

}; 

for  (clock =0;clock<  sssimtime-.clock-H-H)  { 
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ioplist-  >  processiop(ioplist); 
cbus.pTocessbus(scheduler,ioplist,aplist,gmlist); 
for  (int  i=0;i<  16;i++)  { 
dtn[i].processbus(scheduler,ioplist,aplist,gmlist); 

}; 

gmlist-  >  processgm(gmlist,apIist); 
scheduler-  >  processsch(instancestart,instancefinish,runcase, 
headofdeplist); 

aplist-  >  processap<^list,breakdownpriority,scheduler,instancestart, 
instancefimsh); 

}; 

^list-  >  calcaunotbusytinie(aplist); 

gnodelisting-  >  calcalLnodeinstavgtime(instancestart,instancefinish); 
gqueuelisting-  >  calcqueuetimes(instancestart,instancefiiiish); 
fclose(stdout); 

}; 

} 


APPENDIX  D:  INPUT/OUTPUT  PROCESSOR  CODE 

//  Description  :  For  every  iop  do  the  following 
//  generate  instructions  for  any  external  inputs  that  require  it 

//  if  there  is  a  transfer  in  progress  do  nothing  otherwise 

//  check  to  see  if  currently  processing  and  time  to  be  done 

//  processing,  if  it  is  then  update  processing  status  and  place 

//  information  in  queue. 

//  if  not  processing,  then  get  the  next  node  from  the  head  of  the 

//  queue  and  process  it. 

//  go  to  next  iop 

//  Calls  :  getcurrinst  -  to  get  the  currenmodes  instruction 
//  getgnodenum  -  to  get  the  currenmodes  id  number 

//  getgnodeinputqslist  >  to  get  the  pointer  to  the  nodes  inputqs 

//  getgnodeoutputqslist-  to  get  the  pointer  to  the  nodes  outputqs 

//  setfields  •  to  set  the  new  instructions  fields 

//  gemextelement  -  to  get  the  next  queue  information 

//  updatebusytill  >  to  update  the  objects  busytill  time 

//  setinst  *  to  only  change  the  instruction  name 

void  ioprocessors  ::  processiop(ioprocessors  ^listofiops)  { 
ioprocessors  *templistofiops  =  listofiops; 
list  '^tempnodesassigned; 

while  (templistofiops  !  =  NULL)  { 
tempnodesassigned  =  templistofiops- >nodesassigned; 
while  (tempnodesassigned  !  =  NULL)  { 
templistofiops-  >  ioobject.generateinsts(tempnodesassigned); 
tempnodesassigned  =  tempnodesassigned- >  nextentry; 

}; 

if  (templistofiops- >ioobject.xferinprogressO)  { 

//  do  nothing 

} 

else  { 

if  ((templistofiops- >ioobject.isprocessingO)  && 

(templistofiops- >ioobject.finishtimeO))  { 
templistofiops-  >  ioobject.setprocessing(f;^se); 
templistofiops-  >  ioobject.placeinqueueO; 

}; 

if  (templistofiops- >ioobjdct.isprocessing()  ==  false)  { 
if  (templistofiops-  >  ioobject.gemexmodeO)  { 
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templistofiops-  >  ioobject.processiopnodeO; 

}; 

}; 

}; 

templistofiops  =  templistofiops- >  nextiop; 

}; 

}; 

//  Description  :  Determine  the  current  instruction 
//  Based  on  this  current  instruction  update  the  fields 

//  nad  generate  a  new  instruction  if  required. 

//  Calls  :  getcurrinst  -  to  get  the  current  instruction 
//  getgnodenum  -  to  get  the  node  number 

//  getgnodeinputqslist  -  to  get  the  list  of  input  queues 

//  setfields  -  to  set  the  fields  for  the  instruction 

//  getgqueueid  -  to  get  the  queue  id  number 

//  getgmid  -  to  get  the  gm  id  number 

//  gemextelement  -  to  get  the  next  element  in  the  list 

//  updatd)usytill  -  to  update  the  busytill  time 

//  assignloc  -  to  assign  the  location 

//  getqthresh  -  to  get  the  queue  threshold  value 

//  getqofgnodenum  -  to  get  the  queue  number 

void  object ::  processiopnodeO  { 
ptrtoptrtoaq  *gnqptr, 
int  gnodenum, 

numrequired  =  0; 
objectnode  *tempnodeptr; 
tempnodeptr  =  currentnode; 
if  (tempnodeptr  !  =  NULL)  { 
switch  (tempnodeptr- >getcuninstO)  { 
case  eis:  { 

gnodenum  -  tempnodeptr- >getgnodenumO; 
gnqptr  =  gnodelisting->getgnodeinputqslist(gnodenum); 
while  (gnqptr  !  =  NULL)  { 
numrequired++; 

tempnodeptr-  >  setfields(rq,gnodenum,gnqptr-  >  getgqueueidO,gm, 
gnqptr- >  getgmidO,iop,objectid); 
gnqptr  =  gnqptr- >  gemextelementQ; 
if  (gnqptr  !  =  NULL)  { 
if  (tempnodeptr-  >  nexmode  =  =  NULL)  { 
if  (!(tempnodeptr->  nexmode  =  new  objecmode))  { 
fprintf(stderr, "Insufficient  memory  for  objecmode\n"); 
exit(l); 

}; 
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tempnodq)tr- >  nextnode- >  nextnode  =  NULL; 

}; 

tenipnodq}tr  =  tenipnodeptr->  nextnode; 

}; 

}; 

setpiocessing(true); 

up^tebusytill(40+0.22^umrequiied); 

break; 

}; 

caseaq:  { 

tempnc^eptr-  >  setinst(cq); 
tempnodeptr->locto  *  tempnodeptr->locfrom; 
tempnodeptr-  >  locfrom.assignloc(iop,objectid); 
setpiocessing(true); 

up^tebusytill(20+0. 1 1  ’•gqueuelisting-  >  getqthresh(tempnodeptr-  > 
getqofgnodenumQ)); 

break; 

}; 

}; 

}; 

}; 
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APPENDIX  E:  GLOBAL  MEMORY  CODE 

//  Description  :  For  every  gm  do  the  following 
//  if  there  is  a  transfer  in  progress  do  nothing  otherwise 

//  check  to  see  if  currently  pmcessing  and  time  to  be  done 

//  processing,  if  it  is  tbon  update  processing  status  and  place 

//  information  in  quevc. 

//  if  not  processing,  then  get  the  next  node  from  the  head  of  the 

//  queue  and  process  it. 

//  go  to  next  gm 

//  Calls  :  xferinprogress  -  to  determine  if  currently  xfering  data 
//  isprocessing  -  to  determine  if  currently  processing 

//  finishtime  -  to  determine  if  complete  processing 

//  setprocessing  -  to  update  the  processing  status 

//  placeinqueue  -  to  place  in  cbus  or  dtn  queue 

//  gemextnode  -  to  get  the  next  node  from  the  head  of  q 

//  processgmnode  •  to  process  the  node 

void  gmprocessors  ::  processgmfgmprocessors  ’*listofgms,approcessors  '"Ustofaps)  { 
gmprocessors  *templistofgms  =  listofgms; 
while  (templistofgms  !  =  NULL)  { 
if  (tempUstofgms->gmobject.xferinprogressO)  { 

//  do  nothing 

} 

else  { 

if  ((templistofgms- >gmobject.isprocessingO)  && 

(templistofgms-  >  gmobject.  fWshtimeO))  { 
templistofgms-  >  gmobject.setprocessing(false); 
templistofgms-  >  gmobject.  pla^inqueueQ; 

}; 

if  (templistofgms-  >  gmobject.isprocessingO  =  =  false)  { 
if  (templistofgms- >gmobject.getnextnodeO)  { 
templistofgms-  >  gmobject.  processgmnode(listofaps); 

}; 

}; 

}; 

templistofgms  =  templistofgms- >  nextgmproc; 

}; 

}; 

//  Description  :  Get  the  currentnodes  instruction 

//  Based  on  that  nodes  instruction  generate  another  appropriate 
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//  instruction  for  it  and  update  the  fields  involved. 

//  Calls  :  getcurrinst  •  to  get  the  currentnodes  current  inst 
//  setinst  •  to  set  the  instruction 

//  •  assignloc  -  to  assign  the  location 

//  getqthresh  -  to  get  the  queue  threshold  value 

//  getqofgnodenum  •  to  get  the  queue  id  number 

//  setgnodenum  -  to  set  the  new  node  number 

//  getqnodeoutnum  -  to  get  the  queue  id  number  for  out  node 

//  addtolength  -  to  increase  the  size  of  that  queue 

//  subfromlength  •  to  decrease  the  size  of  that  queue 

//  updatebusytill  -  to  update  the  busytill  time 

void  object ::  processgmnode<approcessors  '*temp2aplist)  { 
size  tempsize  »  ut; 

gqueue  *tempgqueuelist  -  gqueuelisting; 
switch  (currentnode->getcurTinstO)  { 

case  sis:  { 

cunentnode-  >  setinst(ais); 

currentnode->locfiom  =  curTcnmode->locto; 

curTentnode->locto  =  cuTTenmode->locassoc; 

setprocessing(true); 

updatebusytill(44); 

break; 

}; 

case  rq:  { 

currentnode->locto  =  currentnode- >  locfirom; 

currentnode>  >  setinst(aq); 

currenmode-  >  locfrom.assignloc(gm,objectid); 

setprocessing(true); 

up^tebusytillf  10+0. 1 1  *tempgqueuelist- >  getqthreshfcurrentnode- > 
getqofgnodenumQ)); 

break; 

}; 

case  wq:  { 

if  (currentnode- >  locfirom.  getlocationO  ==  ap)  { 
temp2aplist-  >  updatebdstatus(temp2aplist,currenmode-  >  locfrom. 
getlocationnumO); 

}; 

tempsize  =  tempgqueueiist->addtolength(currenmode->getqofgnodenum()); 
if  (tempsize  =  =  ot)  { 

currentnode- >  locfrom  =  currenmode- >locto; 
currenmode-  >  setinst(qot); 
currenmode-  >  Iocto.assignloc(sch,  1); 

currenmode-  >  setgnodenum(tempgqueuelist-  >  getqnodeoutnum(currenmode-  > 
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getqofgnodenumO)); 


updatebusytill(12); 

} 

else  if  (tempsize  =  =  oc)  { 
airrentnode'>locfrom  =  currentncxie->locto; 
cuirentnode-  >  setmst(qoc); 
currentnode*  >  locto.assignloc(sch,  1); 

currentnode-  >  setgnodenum(tempgqueuelist-  >  getqnodeoutnum(currentnode-  > 

getqofgnodenumO)); 

updatebusytill(12); 

} 

else  { 

currentnode-  >  setinst(dest); 
updatd)usytill(12); 

}; 

setprocessing(true) ; 
break; 

}; 

case  cq:  { 

if  (currentnode- >locfrom.getlocationO  ==  ap)  { 
temp2aplist-  >  updatdxistatus(temp2aplist,currentnode-  >  locfrom. 

getlocationnumO); 

}; 

tempsize  =  tempgqueuelist->subfTomlength(currenmode-> 
getqofgnodenumO); 
if  (tempsize  =  =  uc)  { 
currenmode-  >  setinst(quc); 
currenmode- >  locfrom  =  currentnode- >locto; 
currentnode-  >  locto.assignloc(sch,  1); 
updatebusytill(12); 

} 

else  { 

currentnode-  >  setinst(dest); 
updatebusytill(  1 0) ; 

}; 

setprocessing(true) ; 
br«dc; 

}; 

}; 

}; 
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/ 


APPENDIX  F:  SCHEDULER  CODE 


//  Description  :  The  following  function  follows  the  scheduler  description  in  Chapter  FV. 
//  Calls  :  isprocessing  -  to  check  the  processing  status 
//  finishtime  -  to  see  if  time  to  complete  processing 

//  setprocessing  *  to  set  the  processing  status 

//  placeinqueue  •  to  place  the  instruction  in  the  queue 

//  getnextnode  -  to  get  the  next  node  to  process 

//  retcuirinst  -  to  get  the  current  instruction 

//  addtolist  -  to  add  an  item  to  a  list 

//  subfromlist  -  to  remove  an  item  from  the  list 

//  setcuirinst  -  to  set  the  current  instruction 

//  getgnodemputqslist  >  to  get  the  list  of  input  queues 

//  getnodenum  -  to  get  the  node  id  number 

//  questionot  -  to  determine  if  queue  over  threshold 

//  getnextelement  -  to  get  the  next  queue  list  element 

//  gedopnumber  •  to  get  the  log  number 

//  estsink  -  to  establish  the  sink  node 

//  generatematch  -  to  match  ready  node  to  free  ap 

void  schpTOcessor  ::  processsch(int  nodeinststart,int  nodeinstfinish, 
int  runcase,dependencyqs  *tempheaddeplist)  { 

//  No  assignment  or  referencing  a  specific  object  required  since  there  is 
// only  one  scheduler- > 
list  ^tempfreeaplist, 

•tempreadynodelist, 

*terapexeclist; 

ptrtoptrtoaq  '*tempgnqptr; 
boolean  allot  »  true; 

boolean  match  =  false; 

boolean  alreadyexec  =  false; 

boolean  gensinknode  =  false; 

int  tempiopnum  =  0; 

int  queuecount  =  0; 

int  replications  »  1; 

if  (schobject.xferinprogressO)  { 

//  do  nothing 

} 

else  { 
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if  ((schobject.isprocessingO)  &&  (schobject.fimshtimeO)) 
schobject.setpiocessing(false); 
schobject.placeinqueueO; 


}; 

if  (schobject.isprocessingO* “false)  { 
if  (schobject.getnextnodeO==true)  { 
switch  (schobject-retcurrinstO)  { 
case  rfis:  { 

freeaplist  »  fTeeaplist->addtolist(fTeeaplist, 
schobject-getainfromlocnumO); 
schobject.setcurTinst(dest); 
schobject.setpiocessing(tnie); 
schobject.updatebusytill(  17); 
break; 


{ 


}; 

case  quc:  { 

if  (inhibitedlist-  >  inhibited(inhibitedlist,gqueuelisting-  > 
getqnodeinnum(schobject.getqueuenuinO)))  { 
inhibitedlist  =  inhibitedlist- >subfromlist(inhibitedlist. 


gqueuelisting-  >  getqnodeinnum(schobject. 
getqueuenumQ)); 

printf("NODE  BACK  UNDERCAPACITY:  "); 

printf("  %d"  .gqueuelisting-  >  getqnodeinnum(schobject.getqueuenumO)); 
printf("  at  clock:  "); 
printf("  %ld'’ .clock); 
printf("\n"); 

}; 

schobject.  setcurrinst(dest); 
schobject.setprocessing(true); 
schobject.  up^tebusytill(  1 0) ; 
break; 

}; 

case  qoc:  { 

if  (inhibitedlist-  >  inhibited(inhibiledlist,gqueuelisting-  > 
getqnodeinnum(schobject.getqueuenumO)))  { 

//  do  nothing  here  instruction  set  to  dest  later 

} 

else  { 

inhibitedlist  =  inhibitedlist- >addtolist(inhibitedlist, 
gqueuelisting-  >  getqnodeinnum(schobject. 
getqueuenumO)); 

printf("NODE  OVERCAPACITY:  "); 

printf("  %d"  .gqueuelisting-  >  getqnodeinnum(schobject. 
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getqueuenumO)); 
printf("  at  clock:  "); 
piintf("  %ld"  .clock); 
printf("\n"); 

}; 

}; 

case  qot:  { 

if  (inhibitedlist->inhibited(inhibitedlist,schobject. 

getnodenumO))  { 

//  do  nothing  now  checking  to  see  if  current  node  inhibited 

} 

else  { 

tempgnqptr  =  gnodelisting->getgnodeinputqslist(schobject. 
getnodenumO); 

gnodelisting-  >  incnumqsrec(schobject.getnodenumO); 
while  (tempgnqptr  !  =  NULL)  { 
queuecount++; 

if  (tempgnqptr- >questionotO)  { 

//  do  nothing  allot  already  true 

} 

else  { 

allot  =  false; 

}; 

tempgnqptr  =  tempgnqptr- >getnextelementO; 

}; 

if  (gnodelisting-  > areallqsrec(schobject.getnodenumO, 
queuecount))  { 

gnodelisting-  >  setnumqsrec(schobject.getnodenumO); 
tempiopnum  =  gnodelisting- >getiopnumber(schobject. 

getnodenumO); 
if  (tempiopnum  =  =  0)  { 
if  (allot)  { 

replications  =  gqueuelisting->getrepnumber(schobject. 
getqueuenumO); 

for  (int  i=l;i<  =replications;i+-F)  { 
readynodelist  =  readynodelist- >  addtolist( 
readynodelist.schobject. 
getnodenumO); 

gnodelisting-  >  calctimebtwnbdrl(schobject. 
getnodenumO, 
nodeinststart, 
nodeinstfinish); 

gnodelisting-  >  setinstance(schobject.  getnodenumO, 
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nodeinststart, 

nodeinstfinish,false); 

}; 

}; 

} 

else  { 

1 1  sink  node 
if  (aUot)  { 

gensinknode  =  true; 

}; 

}; 

}; 

}; 

schobject.setcuninst(dest); 
schobject.  setprocessing(true); 
schobject.  up^tebusytill(  17); 
break; 

}; 

}; 

if  (gensinknode)  { 

schobject.estsink(tempiopnum,nodeinststait,nodeinstfinish); 

} 

dse  { 

tempfreeapUst  =  freeaplist; 
tempreadynodelist  »  readynodelist; 

while  ((tempreadynodelist !  -  NULL)  &&  (match  =  =  false)){ 
if  (tempfTe«q)li^  !  =  NULL)  { 
tempexeclist  =>  executing^t; 
while  (tempexeclist !  =  NULL)  { 
if  (tempexeclist- >getnumberO  =  = 
tempreadynodelist- >getnumberO)  { 
alr^yexec  «  true; 
break; 

} 

else  { 

alieadyexec  =  false; 

}; 

tempexeclist  =  tempexeclist- >nextentry; 

}; 

if  (alreadyexec)  { 

//do  nothing  since  an  instance  of  this  node  is  already  exec- 

} 

else  { 
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if  (nincase  =  =  2)  { 

if  (tempheaddeplist-  >  checktokens(teinpheaddeplist, 
tempreadynodeiist-  > 
getnumb^))  { 

tempheaddeplist-  >  adjusttokens(tempheaddeplist, 
tempreadynodeiist-  > 
gemumb^); 
match  »  tree; 
schobjectupdatebusytill(3); 
schd)ject.gaieiatematch(rempreadynodelist-  > 
gemumberO.tempfreeaplist 
->getnumberO); 

executinglist  -  executinglist->addtolist( 
executinglist,schobject. 
gemodenurnO); 

freeaplist  =  freeaplist->subfipomlist(fTeeaplist, 
tempfre^list-> 
getnumberO); 

readynodelist  ^  readynodelist->subfromlist( 
readynodelistf 
schobject. 

getnodenumO); 

gnodelisting-  >  settimearTatap<schobject. 
gemodenurnO. 
nodeinststart, 
nodeinstfinish); 

gnodelisting-  >  calcnodeinsttime(schobject. 
gemodenurnO, 
nodeinststart, 
nodeinstfinish); 

tempfreeaplist  =  tempfTBeaplist->nextentry; 

} 

else  { 

//  do  nothing  since  dependencies  are  not  met 

}; 

} 

else  { //  FCFS  case 
match  =  true; 

schobject.  updatebusytill(3);  - 

schobject.generatematch(tempreadynodelist-  >  gemurnberO, 
tempfreeaplist-  >  gemurnberO); 
executinglist  =  executinglist- >addtolist(executinglist, 
schobject.  gemodenurnO); 


103 


fireeaplist  =  freeaplist->subfiromlist(freeaplist, 
tempfireeaplist-> 
getnumberO); 

readynodelist  »  readynodelist->subfroinlist(readynodelist, 
schobject.getnodenumO); 

gnodelisting-  >  settimeaiTatap(schobject.getnodenumO, 
nodeinststart, 
nodeinstfinish); 

gnodelisting-  >  calcnodeinsttime(schobject.getnodenumO, 
nodeinststart, 
nodeinstfinish); 

tempfreeaplist  -  tempfreeapiist->nextentry; 

}; 

}; 

}; 

tempreadynodelist  =  tempieadynodelist->nextentry; 

}; 


e  { 

tempfreeaplist  -  freeaplist; 
tempreadynodelist  =  r^ynodelist; 

while  ((tempreadynodelist !  =  NULL)  8l&,  (match  =  =  false)){ 
if  (tempfreeaplist !  *  NULL)  { 
tempexeclist  ~  executinglist; 
while  (tempexeclist !  =  NULL)  { 
if  (tempexeclist- >getnumba<)  =  = 
tempr^ynodelist->gemumberO)  { 
alr^yexec  =  true; 
break; 

} 

else  { 

alreadyexec  =  false; 

}; 

tempexeclist  =  tempexeclist- >nextentry; 

}; 

if  (alreadyexec)  { 

//do  nothing  since  an  instance  of  this  node  is  already  exec 

} 

else  { 

if  (runcase  =  =  2)  { 

if  (tempheaddeplist-  >checktokcns(tempheaddeplist, 
tempreadynodelist-  > 


getnumberO))  { 

tempheaddeplist-  >  adjusttokens(tempheaddeplist, 
tempreadynodelist'  > 
getnumberO); 
match  =  true; 
schobject.updatebusytill(10); 
schobject.setprocessing(true); 
schobject.generatematch(tempreadynodelist-  > 
gemumberO.ternpfreeaplist 
->gemumberO); 

executinglist  =  executinglist->addtoiist( 
executinglist,schobject. 
getnodenumO); 

freeaplist  =  freeaplist->subfromlist(freeaplist, 
tempfreeaplist-> 
getnumbe^); 

readynodelist  =  readynodelist->subfromlist( 
readynodelist, 
schobject. 
getnodenumO); 

gnodelisting-  >  settimearratap(schobject. 

*  getnodenumO, 

nodeinststart, 
nodeinstfinish); 

gnodelisting-  >  calcnodeinsttime<schobject. 
getnodenumO, 
nodeinststart, 
nodeinstfinish); 

tempfreeaplist  =  tempfreeaplist- >  nextentry; 

} 

else  { 

//  do  nothing  since  dependencies  are  not  met 

}; 

} 

else  { //  FCFS  case 
match  =  true; 

schobject.generatematch(tempreadynodelist-  >  gernumberO, 
tempfreeaplist-  >  getnumberO); 
schobject.setprocessing(true); 
schobject.  up^tebusytill(  10); 
executinglist  =  executinglist- >addtolist(executingIist, 
schobject.gemodenumO); 
freeaplist  =  freeaplist- >subfromlist( freeaplist. 
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tempfTeeaplist-> 

getnumberO); 

readynodelist  =  readynodelist->subbromlist(readynodelist, 

schobject.getnodenumO); 

gnodelisting-  >  settimearratap(schobject.getnodenumO, 
nodeinststart, 
nodeinsdimsh); 

gnodelisting-  >  calcnodeinsttime(schobject.getnodenumO, 
nodeinststart, 
nodeinstflnish); 

tempfreeaplist  =  tempfreeaplist- >  nextentry; 

}; 

}; 

}; 

tempreadynodelist  =  tempreadynodelist- >  nextentry; 
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APPENDIX  G:  ARITHMETIC  PROCESSOR  CODE 


//  Description  :  For  each  ap  perform  the  following 

//  if  the  ap  is  currently  transferring  information  do  nothing 

//  otherwise  perform  the  following 

//  check  to  see  if  the  breakdown  is  complete  and  forward  on  info 

//  check  to  see  if  the  execution  is  complete  and  forward  on  to  bd 

//  check  to  see  if  setup  is  complete  and  forward  on  to  execution 

//  adjust  breakdown  and  setup  status 

//  if  control  unit  is  not  processing  setup  and  breakdown  then 

//  get  the  next  instruction  and  process  it 

//  go  to  next  ap 

//  Called  by  :  Main 

//  Calls  :  xferinprogress  -  to  determine  if  currently  xfering  data 
//  isprocessing  -  to  determine  if  currently  processing 

//  finishtime  -  to  determine  if  complete  processing 

//  setprocessing  •  to  update  the  processing  status 

//  placeinqueue  -  to  place  in  cbus  or  dm  queue 

//  placeinbreakdown  •  to  place  in  breakdown  queue 

//  destroynode  -  to  delete  the  current  node 

//  getexecnode  -  to  get  the  next  node  to  execute 

//  gemodenurn  -  to  get  the  node  number  of  this  node 

//  updatd}usytill  -  to  update  the  busytill  time 

//  sendrfis  >  to  send  the  rfis  to  the  scheduler 

//  gemextinst  -  to  fetch  the  next  instruction  for  cu 

void  approcessors  ::  processap(approcessors.*listofaps,boolean  bkdnpriority, 

schproccssor  '•tempsched/nt  nodeinststart, 
int  nodeinstfinish)  { 
approcessors  *templistofaps  =  listofaps; 
int  numnode; 

while  (templistofaps  !  =  NULL)  { 
if  ((templistofaps->apsetup.xferinprogressO)  |  j 
(templistofaps-  >  apbreakdown.xferinprogressO))  { 

//  do  nothing 

} 

else  { 

if  ((templistofaps- >numbdnodes  ==  0)  && 

(templistofaps- >apbreakdownstatus))  { 

templistofaps- >  apbreakdownstatus  =  false;  //  NEXT  LINE  NEW 
tempsched-  >  updateexeclist(templistofaps-  >  apbdholdnodenum); 


gnodelisting-  >  calctimemstatap(teinplistofops-  >  apbdholdnodenum, 
nodeinststart,nodeinstfinish); 

}; 

if  ((templistofaps->apbreakdown.isprocessingO)  && 
(templistof;^<>apbreakdown.finishtimeO)  Sc& 
(templistofaps->apbreakdownstatiis  ==  folse))  { 
templistof£q>s->2^reakdowiistatus  ==  true; 
templistof^'>  apbdholdnodenum  «  templistofaps->apbreakdown. 
getnodenumO; 

templistofaps-  >  apbreakdown.setprocessmg(£alse); 
templistofops-  >  apbreakdown.placeinqueueO; 

}; 

if  (templistofaps->apexecuting.isprocessLigO  ==  false)  { 
(templistofaps-  >  aunotbusytime) + + ; 

}; 

if  ((tempiistofaps->apexecutmg.isprocessingO)  && 
(templistofaps->apexecuting.f!nishtiffleO))  { 
if  ((templistofaps->apbreakdownstatus)  j  | 
(templistofaps->apbreakdown.isprocessingO))  { 

(templistofaps-  >  aunotbusytime) + + ; 

}; 

if  ((templistofaps- >apbreakdownstatus  ==  false)  && 
(templistofaps- >apbreakdown.isprocessingO  ==  false))  { 
templistofaps-  >  apexecuting.seq}rocessing(false); 
templistofaps-  >  placembreakdown(templistofaps-  >  apexecuting. 
retumexecnodeO);//MODIFICATION 

}; 

}; 

if  ((templistofaps- >apsetup.isproeessingO)  && 

(templistofaps  >  apsetup.finishtimeO))  { 
if  (templistofaps- >apsetupstatus  =  —  notstarted)  { 
templistofaps- >apsetupstatus  =  inprogress; 
templistofaps-  >  apsetup.setprocessing(false); 
templistofaps-  >  apsetup.placeinqueueQ; 

} 

else  { 

if  (templistofaps- >apsetupstatus  ==  inprogress)  { 
templistofaps-  >  apsetup.setprocessing(false); 
templistofaps-  >  apsetup.destroynodeQ; 

} 

else  //setupstatus  complete 

if  (templistofaps- >apexecuting.isprocessingO  ==  false)  { 
templistofaps- >apsetupstatus  notstarted; 
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templistofaps-  >  apsetup.setprocessing(false); 
templistofaps-  >  apexecuting.getexecnode(teinplistofaps-  > 
apsetup. 
getcurmodeO); 

templistofaps-  >  apexecuting.setprocessing(true); 
numnode  =  templistofaps- >apexecuting.getnodenuiiiO; 
templistofaps-  >  apexecuting.updatebusytill(gnodelisting-  > 
getprimtime< 
numnode)); 

templistofaps-  >  apexecuting.sendrfisO; 

}; 

}; 

}; 

if  ((templistofaps- >apsetup.isprocessingO  ==  false)  && 
(templistofaps- >apbreakdown.isprocessingO  ==  false))  { 
templistofaps-  >  getnextinst(bkdnpriority); 
if  ((templistofaps- >numsunodes  ==  ())  Sc& 

(templistofaps- >apsetupstatus  inprogress))  { 
templistofaps- >apsetupstatus  =  complete; 

}; 

}; 

}; 

templistofaps  =3  templistofaps- >nextapproc; 

}; 

}; 


// 

// 

// 

// 

// 

// 

// 

// 

// 

// 

// 

// 

// 

// 

// 

// 

// 

// 


Description  :  Determine  the  current  instruction 
if  it  is  ais  then  do  the  following 
determine  the  node  number 
get  the  input  qs  list  that  goes  with  that  node  number 
for  every  entry  in  that  list  do  the  following 
establish  a  new  rq  instruction 
update  the  busytill  time 
if  it  is  aq  then  do  the  following 
if  it  is  the  last  one  that  we  are  waiting  on  then  prepare 
it  for  destruction 
update  the  busytill  time 
if  it  is  dest  then  do  the  following 
determine  the  nodenumber 
get  the  list  of  output  qs  associated  with  that  node 
for  every  entry  in  that  list  do  the  following 
generate  a  wq  instruction 
get  the  list  of  input  qs  associated  with  that  node 
for  every  entry  in  that  list  do  the  following 
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//  generate  a  cq  instruction 

//  update  the  busytill  time 

//  Calls  :  getcurrinst  -  to  get  the  currentnodes  instruction 
//  getgnodenum  -  to  get  the  currentnodes  id  number 

//  getgnodeinputqslist  -  to  get  the  pointer  to  the  nodes  inputqs 

//  getgnodeoutputqslist-  to  get  the  pointer  to  the  nodes  outputqs 

//  getaissize  -  to  get  the  ais  size  in  words 

//  setfields  -  to  set  the  new  instructions  fields 

//  getnextelement  -  to  get  the  next  queue  information 

//  updatebusytUl  -  to  update  the  objects  busytill  time 

//  sednst  •  to  only  change  the  instruction  name 

int  object ::  processapnode<int  numrequired)  { 
objectnode  *tempnodeptr; 
int  count, 

holdnwis, 
gnodenum; 
ptrtoptrtoaq  ^gnqptr, 

•tempgnqptr; 

tempnodeptr  »  currentnode; 
switch  (tempnodeptr->getcurrinstO)  { 
case  ais: 

{ 

gnodenum  =  tempnodeptr- >getgnodenumO; 
holdnwis  *  gnodelisting->getaissize(gnodenum); 
gnqptr  ==  gnodelisting->getgnodeinputqslist(gnodenum); 
tempgnqptr  =  gnqptr, 
while  (tempgnqptr  !  =  NULL)  { 
numrequi^++; 

tempnodeptr-  >  setfields(rq,gnodenum,tempgnqptr-  >  getgqueueidQ.gm, 
tempgnqptr- >  getgmidO,ap,objectid); 
tempgnqptr  =  tempgnqptr- >getnextelementO; 
if  (tempgnqptr  !  =  NULL)  { 
if  (tempnodeptr-  >  nextnode  =  =  NULL)  { 
if  (!(tempnodeptr->  nextnode  =  new  objectnode))  { 
fprintffstderr, "Insufficient  memory  for  objectnode\n"); 
exit(l); 

}; 

tempnodeptr- >  nextnode- >  nextnode  =  NULL; 

}; 

tempnodeptr  =  tempnodeptr- >  nextnode; 

}; 

}; 
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se^rocessing(true); 

updatebusytill(40+0.22*holdnwis); 

break; 

}; 

case  aq: 

{ 

if  (-numrequired  =  =  0)  { 
tempnodeptr-  >  setinst(dest); 

}; 

set|>rocessiiig(true); 

up^td)usytill(20+0. 1 1  ^gqueuelisting-  >  getqthresh(tempnodeptr-  > 
getqofgnodenumO)); 

break; 

}; 

case  dest:  //only  here  if  instruction  done  executing  coming  from  breakdn 

{ 

count=0; 

gnodenum  »  tempnodeptr- >getgnodenumO; 
gnqptr  »  gnodelisting->getgnodeouq)utqslist(gnodenum); 
tempgnqptr  »  gnodelisting->getgnodeihputqslist(gnodenum); 
while  (gnqptr  !=  NULL)  { 
count++; 

tempnodeptr-  >  setfields<wq,gnodenum,gnqptr-  >  getgqueueidO,gm, 
gnqptr- >  getgmidO,ap,objectid); 
gnqptr  —  gnqptr- >getnextelementO; 
if  ^qptr  !  =  NULL)  { 
if  (tempnodeptr- >nextnode  ==  NULL)  { 
if  (!(tempnodeptr->nextnode  =  new  objectnode))  { 
fprintf(stderr, "Insufficient  memory  for  objecmode\n"); 
cxit(l); 

}; 

tempnodeptr- >  nextnode- >  nextnode  =  NULL; 

}; 

}; 

if  (tempnodeptr-  >  nextnode  =  =  NULL)  { 
if  (!(tempnodeptr->  nextnode  =  new  objeemode))  { 
fprintf(stderr, "Insufficient  memory  for  objectnode\n"); 
exit(l); 

}; 

}; 

tempnodeptr  =  tempnodeptr- >  nextnode; 

}; 

while  (tempgnqptr  !  =  NULL)  { 
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count++; 

tempnodq>tr-  >  setfields(cq,gnodenum,tempgnqptr-  >  getgqueueidQ.gm, 
tempgnqptr-  >  getgmidO.ap.objectid); 
tempgnqptr  -  tempgnqptr- >getnextelement0; 
if  (tempgnqptr  !  =»  NULL)  { 
if  (tempnodeptr-  >  nextnode  =  =  NULL)  { 
if  (!(tempnodeptr->  nextnode  -  new  objecmode))  ( 
fprintf(stderr, "Insufficient  memory  for  objecmode\n”); 
exit(l); 

}; 

}; 

}; 

tempnodeptr  =  tempnodeptr- >  nextnode; 

}; 

seq>rocessing(true); 
updatd)usytill(12  +0. 1 1  *count); 
numiequired  =  count; 
break; 

}; 

}; 

return  numiequired; 

}; 
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APPENDIX  H:  INTER-COMMUNICATION  CODE 

//  Description  :  If  waiting  on  a  transfer  to  complete  then  check  to  see  if 
//  that  transfer  is  complete,  otherwise  commence  xfer.  If  not 

//  waiting  on  xfer  to  complete  then  check  to  see  if  currently 

//  processing  a  xfer  and  if  it  is  complete  then  complete  xfer. 

//  if  not  processing  a  transfer,  then  get  the  next  instruction 

//  and  process. 

//  Calls  :  checklocsxfering  -  to  check  locations  xfering 
//  commencxfer  -  to  b^in  the  transfer 

//  completexfer  -  to  complete  the  transfer 

//  isprocessing  -  to  check  the  processing  status 

//  finishtime  -  to  check  if  time  complete 

//  getnextinstandprocess-  to  get  the  next  inst  and  process  it 

void  xferproc  ::  processbus(schprocessor  ^scheduler.ioprocessors  *iopiist, 
approcessors  *aplist,gmprocessors  *gmlist)  { 
if  (waitingonxferatlocs)  { 

if  (xferobject.checklocsxfering(scheduier,ioplist,aplist,gmlist))  { //  do  nothing 

} 

else  { 

waitingonxferatlocs  false; 

xferobject.commencexfer(scheduler,ioplist,aplist,gmlist); 

}; 

} 

else  { 

if  ((xferobject.isprocessingO)  &&  (xferobject.finishtimeO))  { 
xferobject.  completexfer(scheduler,ioplist,aplist,gmlist) ; 

}; 

if  (xferobject.isprocessingO  =  =  false)  { 
getnextinstandprocess(scheduler,ioplist,aplist,gmlist); 

}; 

}; 

}; 

void  xferproc  ::  getnextinstandprocess(schprocessor  *scheduler,ioprocessors 
*ioplist,approcessors 

*aplist,gmprocessors  ^gntlist)  { 
if  (xferobject.  getnexmodeO)  { 

if  (xferobject.checklocsxfering(scheduler,ioplist,aplist,gmlist))  { 
waitingonxferatlocs  =  true; 

} 
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else  {xferobject.commencexfer(scheduler,ioplist,aplist,gnilist); 

}; 
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APPENDIX  I:  RESULT  GENERATION  CODE 


//  Description  :  The  code  included  in  this  appendix  represents  the  code  added  to  the 
simulator 

//  :  to  keep  track  of  desired  output  data. 

void  gnode  ::  calcnodeinsttime(int  nodenumber, int  nodeinststart, 

int  nodeinstfinish)  { 
gnode  *tempgnodeptr  -  gnodeiisting; 

while  ((tempgnodeptr->nodeid  !=  nodenumber)  8l8l  (tempgnodeptr  !=  NULL))  { 
tempgnodeptr  =  tempgnodeptr- >nextgnode; 

}; 

if  (tempgnodeptr  =  =  NULL)  { 

cerr  <  <  "\nERROR  CALCXJLATING  NODE  INSTANCE  TIME\n"; 

} 

else  { 

printf("Off  RL  node  number  "); 
printf("  %d" , nodenumber); 
printf("  instance:  "); 

printf("  ?6d*,  tempgnodeptr- >  lastinstoffrl); 
printf("  Qock:  "); 
printf("  %ld"  .clock); 
printf("\n"); 

if  ((tempgnodeptr- >lastinstoffrl  >  =  nodeinststart)  && 

(tempgnodeptr- >lastinstoffrl  <=  nodeinstfmish))  { 
tempgnodeptr-  >  timenodeinstonrl[tempgnodeptr-  >  lastinstoffrl  - 
nodeinststart]  =  clock  -  tempgnodeptr-  > 
timeinstontorl[tempgnodeptr-  > 
lastinstoffrl  -  nodeinststart]; 

}; 

(tempgnodeptr-  >  lastinstoffrl) + + ; 

}; 

}; 

void  gnode  ::  calcallnodeinstavgtime(int  nodeinststart,  int  nodeinstfinish)  { 
gnode  ^tempgnodeptr  =  gnodeiisting; 
while  (tempgnodeptr  !  =  NULL)  { 
printf("\nNode  ID;  "); 
printf("  %d"  .tempgnodeptr-  >  nodeid); 
if  (tempgnodeptr- >iopidassigned  !=  0)  { 
printf("  is  an  lOP  Node\n"); 

} 
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else  { 

printf(-\n"); 

//  NOTE  ARRAY  IS  ASSUMED  TO  BE  IN  BOUNDS 
for  (int  count=nodeinststart;count<nodeinstfinish;count++)  { 
tempgnodeptr->avgtimeotirI  =  tempgnodeptr- >  timenodeinstonrl[count- 
nodeinststart]  +  tempgnodeptr- >avgtimeonrI; 
tempgnodeptr- >avgtimeatap  =  tempgnodeptr- >  timenodeinstatap[count- 
nodeinststart]  +  tempgnodeptr- >avgtimeatap; 
prmtf("Instance:  "); 
printf("  %d"  .count); 
printf("  Time  on  rl:  "); 

printf(”%Id", tempgnodeptr- >timenodeinstonrI[count  -  nodeinststart]); 
printf("  Time  between  SIS  and  BD  Completion:  "); 
printf("%ld" .tempgnodeptr- >timenodeinstatap[count  -  nodeinststart]); 
printf("\n"); 

}; 

tempgnodeptr- >avgtimeonrl  =  tempgnodeptr- >avgtimeonrI  /  (nodeinstfinish 

-  nodeinststart); 

tempgnodeptr- >avgtimeatap  =  tempgnodeptr- >avgtimeatap  /  (nodeinstfinish 

-  nodeinststart); 
printf(" Average  time  on  rl:  "); 

printf("  %ld"  .tempgnodeptr-  >  avgtimeonrl); 
printf("  Average  time  between  SIS  and  BD  Completion:  *); 
printf("  %  Id"  .tempgnodeptr-  >  avgtimeatap); 
printf("\n"); 

}; 

tempgnodeptr  =  tempgnodeptr- >nextgnode; 

}; 

}; 

void  gqueue  ::  calcqueuetimes(int  nodeinststart,int  nodeinstfinish)  { 
gqueue  '*tempgqueueptr  =  gqueuelisting; 

while  (tempgqueueptr  !  =  NULL)  { 
printf("\nQueue  ID:  "); 
printf("  %d"  .tempgqueueptr-  >  gqueueid); 
if  (tempgqueueptr- >  nodein  ==  -1)  { 
printf(”  is  an  external  input  queue\n"); 

} 

else  { 

if  (tempgqueueptr- >  nodeout  ==  -1)  { 
printf("  is  an  external  output  queue\n"); 

} 

else  { 
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prmtf("\n"); 

//  NOTE  ARRAY  IS  ASSUMED  TO  BE  IN  BOUNDS 
for  (int  count=:nodeinststart;count<nodeinsdinish,‘count++)  { 
printf("Instance:  "); 
printf("  %<i"  .count); 

printf("  Time  between  BD  Completion  and  Successor  on  RL;  "); 
piintf("%Id”,tempgqueueptr->timebtwnbdrl[count  -  nodeinststart]); 
printf("\n"); 

}; 

}; 

}; 

tempgqueueptr  =  tempgqueueptr- >  nextelement; 

}r 

}; 

void  gnode  ::  settimearratap(int  nodenumber.int  nodeinststart, 
int  nodeinstiinish)  { 
gnode  ’*tempgnodeptr  =  gnodelisting; 

while  ((tempgnodeptr->nodeid  !*  nodenumber)  Sc&  (tempgnodeptr  !=  NULL))  { 
tempgnodeptr  »  tempgnodeptr- >nextgnode; 

}; 

if  (tempgnodeptr  **  =  NULL)  { 
cerr  <  <  "\nERROR  SETTING  ARRIVAL  TIME  AT  AP\n"; 

} 

else  { 

if  ((tempgnodeptr- >lastinstoffTl  >  =  nodeinststart)  && 

(tempgnodeptr- >lastinstoffrl  <=  nodeinstiinish))  { 
tempgnodeptr- >timeinstarratap[tempgnodeptr->]astinstoffTl  - 
nodeinststart]  =  clock; 

}; 

}; 

}; 

void  approcessors  ::  calcaunotbusytime(approcessors  ’•tempaplist)  { 
while  (tempaplist !  =  NULL)  { 
printf("\nAP  Number;  "); 
printf("  %d"  .tempaplist-  >  apsetup.getobjectidO); 
printf("  Time  AP  AU  NOT  Busy:  "); 
printf("  %  Id"  .tempaplist-  >  aunotbusytime); 
printf("\n"); 

tempaplist  =  tempaplist- >nextapproc; 

}; 

}; 

void  gqueue  ::  setcompbdtime(int  loctoset)  { 
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tiinecompbdQoctoset]  =  clock; 


}; 

void  ptrtoptrtoaq  ::  setcoinpbdtime<int  loctoset) 
ptrtoaq-  >  setcompbddmeOoctoset); 


{ 


}; 

void  gqueue  ::  calccompbdrltime(int  loctocalc)  { 

timebtwnbdrlQoctocalc]  =  clock  -  timecompbdQoctocalc]; 


}; 

void  ptrtoptrtoaq  ::  calccompbdrltime(int  loctocalc) 
ptrtoaq-  >  calccompbdrltimeOoctocalc); 


{ 


}; 

void  gnode  ::  calctimd>twnbdrl(int  nodenumber,int  nodeinststart, 

int  nodeinstfinish)  { 


gnode  *tempgnodeptr  =  gnodelisting; 
ptrtoptrtoaq  *tempptrtoptr; 

while  ((tempgnodeptr->nodeid  !=  nodenumber)  &&  (tempgnodeptr  !=  NULL))  { 
tempgnodeptr  =  tempgnodeptr- >nex^node; 

}; 

if  (tempgnodeptr  *  =  NULL)  { 

cerr  <  <  "\nERROR  CALCULATING  TIME  BETWEEN  BREAKDOWN  AND 
READYUS’Rn"; 


} 

else  { 

tempptrtoptr  =  tempgnodeptr- >ptrtoinqlist; 
if  ((((tempgnodeptr-  >  lastinstoffrl)  -  1)  >  =  nodeinststart)  && 

(((tempgnodeptr-  >  lastinstoffrl)  -  1)  <  nodeinstfinish))  { 
while  (tempptrtoptr  !=  NULL)  {  //-I 

tempptrtoptr-  >  calccompbdrltime((tempgnodeptr-  >  lastinstoffrl)  - 
nodeinststart); 

tempptrtoptr  =  tempptrtoptr- >getnextelementO; 

}; 

}; 

}; 

}; 

void  gnode  ::  calctimeinstatap(int  nodenumber, int  nodeinststart, 
int  nodeinstfinish)  { 
gnode  ^tempgnodeptr  =  gnodelisting; 
ptrtoptrtoaq  ’^tempptrtoptr; 

while  ((tempgnodeptr- >nodeid  !=  nodenumber)  &&  (tempgnodeptr  !=  NULL))  { 
tempgnodeptr  =  tempgnodeptr- >nextgnode; 

}; 

if  (tempgnodeptr  =  =  NULL)  { 

cerr  <  <  "\nERROR  CALCULATING  NODE  INSTANCE  TIME\n"; 


118 


} 

else  { 

if  ((((tempgnodeptr->lastinstoffrl)  -  1)  >  =  nodeinststart)  && 
(((tempgnodeptr->Iastinstoffrl)  -  1)  <=  nodeinstfinish))  { 

//  NEXT  FOUR  LINES  ARE  NEW  EXPERIMENTAL  AS  OF  8/28/91 
tempptrtoptr  =  tempgnodeptr->ptrtooutqlist;  //  ??INQLIST  OR  OUTQUST 
while  (tempptrtoptr  !  =  NULL)  { 
tempptrtoptr- >setcompbdtime((tempgnodeptr->Iastinstoffrl)  -  1  - 
nodeinststart); 

tempptrtoptr  =  tempptrtoptr- >gemextelementO; 

}; 

tempgnodeptr- >  timenodeinstatap[(tempgnodeptr- >  lastinsto^l  -  1)  - 
nodeinststart]  =  clock  -  tempguodeptr-  > 
timein5tarratap[(tempgnodeptr-  > 
lastinstoffrl  - 1)  -  nodeinststart]; 

}; 

}; 

}; 
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